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APPROVED  FOR  CAL  BY: 


"Perception,  then,  emerges  as  that  relatively  primitive , 
partly  autonomous ,  institutionalized,  ratiomorphic  subsystem  of  cognition 
which  achieves  prompt  and  richly  detailed  orientation  habitually  concerning 
the  vitally  relevant,  mostly  distal  aspects  ot  the  environment  on  the  basis 
of  mutually  vicarious,  relatively  restricted  and  stereotyped,  insufficient 
evidence  in  uncertainty-geared  interaction  and  compromise,  seemingly 
following  the  highest  probability  for  smallness  of  error  at  the  expense  of 

the  highest  frequency  of  precision."  - From  "Perception  and  the 

Representative  Design  of  Psychological  Experiments,"  by  Egon  Brunswik. 

"That's  a  simplification.  Perception  is  standing  on  the  side¬ 
walk,  watching  all  the  girls  go  by." - From  "The  New  Yorker", 

December  1  9,  1  959. 
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PREFACE 


It  is  only  after  much  hesitation  that  the  writer  has  reconciled  him¬ 
self  to  the  addition  of  the  term  "neurodynamics"  to  the  list  of  such  recent 
linguistic  artifacts  as  "cybernetics",  "bionics",  "autonomies",  "biomimesis", 

"  synnoetics"  ,  "  intelectronics" ,  and  "robotics".  It  is  hoped  that  by  selecting 
a  term  which  more  clearly  delimits  our  realm  of  interest  and  indicates  its 
relationship  to  traditional  academic  disciplines,  the  underlying  motivation  of 
the  perceptron  program  may  be  more  successfully  communicated.  The  term 
"perceptron",  originally  intended  as  a  generic  name  for  a  variety  of  theoretical 
nerve  nets,  has  an  unfortunate  tendency  to  suggest  a  specific  piece  of  hardware, 
and  it  is  only  with  difficulty  that  its  well-meaning  popularizers  can  be  persuaded' 
to  suppress  their  natural  urge  to  capitalize  the  initial  "P".  On  being  asked, 

"How  is  Perceptron  performing  today?"  I  am  often  tempted  to  respond,  "'Very 
well,  thank  you,  and  how  are  Neutron  and  Electron  behaving?" 

That  tlie  aims  and  methods  cf  perceptron  research  are  in  need  of 
clarification  is  apparent  from  the  e.xtent  of  the  controversy  within  the  scientific 
■'Or -munity  since  1957,  concerning  the  value  of  the  perceptron  concept.  There 
seem  to  have  been  at  least  three  main  reasons  for  negative  reactions  to  the 
program.  First,  was  the  admitted  lack  of  mathematical  rigor  in  preliminary  re¬ 
ports.  Second,  was  the  handling  of  the  first  public  announcement  of  the  program 
in  1958  by  the  popular  press,  which  fell  to  the  task  with  all  of  the  exuberance  and 
sense  of  discretion  of  a  pack  of  happy  bloodhounds.  Such  headlines  as  "Franken¬ 
stein  Monster  Designed  by  Navy- Rebu'  That  Thinks"  (Tulsa,  Oklahoma  Times) 
were  hardly  designed  to  inspire  scientific  confidence.  Third,  and  perhaps  most 
significant,  there  has  been  a  failure  to  comprehend  the  differ^CiCe  in  motivation 
between  the  perceptron  program  and  the  various  engineering  projects  concerned 
with  automatic  pattern  recognition,  "artificial  intelligence",  and  advanced  computers. 
For  this  writer,  the  perceptron  program  is  not  primarily  concerned  with  the  inven- 


I 


tion  of  devices  for  "artificial  intelligence",  but  rather  with  investigating  the 


physical  structures  and  neurodynamic  principles  which  underlie  "natural 
intelligence".  A  perceptron  is  first  and  foremost  a  brain  model,  not  an  inven  - 
tion  for  pattern  recognition.  As  a  brain  model,  its  utility  is  in  enabling  us  to 
determine  the  physical  conditions  for  the  emergence  of  various  psychological 
properties.  It  is  by  no  means  a  "complete"  model,  and  we  are  fully  aware  of 
the  simplifications  which  have  been  made  from  biological  systems;  but  it  is, 
at  least,  an  analyzable  model.  The  results  of  this  approach  have  already  been 


substantial:  a  number  of  fundamental  principles  have  been  established,  which 
are  presented  in  this  report,  and  these  principles  may  be  freely  applied, 
wherever  they  prove  useful,  by  inventors  of  pattern  recognition  machines  and 
artificial  intelligence  systems. 


The  purpose  of  this  report  is  to  set  forth  the  principles,  motivation, 
and  accomplishments  of  perceptron  theory  in  their  entirety,  and  to  provide  a 
self-sufficient  te.xt  for  those  who  are  interested  in  a  serious  study  of  neuro¬ 


dynamics.  The  writer  is  convinced  that  this  is  as  definitive  a  treatr^ent  as  can 


reasonably  be  accomplished  in  a  volume  of  managable  size.  Since  this  volume 
attempts  to  present  a  consistent  theoretical  position,  however,  the  student 
would  be  well  advised  to  round  out  his  reading  with  several  of  the  alternative 


approaches  referenced  in  Part  I.  Within  the  last  year,  a  number  of  comprehen¬ 


sive  reviews  of  the  literature  have  appeared,  which  provide  convenient  jumping- 
off  points  for  such  a  study. 


The  work  reported  here  has  been  performed  jointly  at  the  Cornell 
Aeronautical  Laboratory  in  Buffalo  and  at  Cornell  University  in  Ithaca.  Both 


programs  have  been  under  the  support  of  the  Information  Systems  Branch  of  the 


Office  of  Naval  Research  --  the  Buffalo  program  since  July,  1957,  and  the  Ithaca 


See,  for  example,  Minsky's  article,  "Steps  Toward  Artificial  Intelligence" , 
Proc.  I,  R.  E.  ,  49,  January,  1961,  for  an  entertaining  statement  of  the  views  of 
the  loyal  opposition,  which  includes  an  excellent  bibliography. 


I 


program  since  September,  1959.  A  number  of  other  agencies  have  contributed 
to  particular  aspects  of  the  program.  The  Rome  Air  Development  Center  has 
assisted  in  the  development  of  the  Mark  1  perceptron,  and  we  are  indebted  to 
the  Atomic  Energy  Commission  for  making  the  facilities  of  the  NYU  computing 
center  available  to  us. 

vreat  many  individuals  have  participated  in  this  work.  R.  D.  Joseph 
and  H.  D,  .dlock,  in  particular,  have  contributed  ideas,  suggestions,  and 
criticisms  to  an  extent  which  should  entitle  them  to  co-authorship  of  several 
chapters  of  this  volume.  I  am  especially  indebted  to  both  ot  ti  ^n.  for  their 
heroic  performance  in  proofreading  the  mathematical  exposition  presented  here, 
a  task  which  has  occupied  many  weeks  of  their  time,  and  which  has  saved  me  from 
committing  many  a  mathematical  felony.  Carl  Kesler,  Trevor  Barker,  David 
Feign,  and  Louise  Hay  ha\-e  rendered  invaluable  assistance  in  programming  the 
various  digital  computers  employed  on  tlie  project,  while  the  engineering  work 
on  the  Mark  I  was  carried  out  primarily  by  Charles  Wightman  and  Francis  Martin 
at  C.  A.  L.  The  experimental  program  with  the  Mark  1  was  carried  out  by  John 
Hay,  In  addition  to  all  of  those  who  have  contributed  directly  to  the  research 
activities,  the  writer  is  indebted  to  Professors  Mark  Kac,  Ba:  .^ley  Rosser,  and 
other  members  of  the  Cornell  faculty  for  their  administrative  support  and  encourage¬ 
ment,  and  to  Alexander  Stieber,  W ,  S.  Holmes,  and  the  administrative  staffs 
of  the  Cornell  Aeronautical  Laboratory  and  the  Office  of  Naval  Research  whose 
confidence  and  support  have  carried  the  program  successfully  through  its 
infancy. 

Frank  Rosenblatt 

1  5  March  1961 
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PART  I 


DEVELOPMENT  OF  BASIC  CONCEPTS 


1.  INTRODUCTION 


The  theory  to  be  presented  here  is  concerned  with  a  class  of 
”  brain  models"  called  perceptrons  .  By  "brain  model"  we  shall  mean 
any  theoretical  system  which  attempts  to  explain  the  psychological  function¬ 
ing  of  a  brain  in  terms  of  known  laws  of  physics  and  mathematics,  and  known 
facts  of  neuroanatomy  and  physiology.  A  brain  model  may  actually  be  cons¬ 
tructed,  in  physical  form,  as  an  aid  to  determining  its  logical  potentialities 
and  performance;  this,  however,  is  not  an  essential  feature  of  the  model- 
approach.  The  essence  of  a  theoretical  model  is  that  it  is  a  system  with 
known  properties,  readily  amenable  to  analysis,  which  is  hypothesized  to 
embody  the  essential  features  of  a  system  with  unknown  or  ambiguous 
properties  --in  the  present  case,  the  biological  brain.  Brain  models  of 
different  types  have  been  advanced  by  philosopher  s ,  psychologists,  biologists, 
and  mathematicians,  as  well  as  electrical  engineers  (c.f..  Refs.  17,  31,  33, 
54,  59,  61,  74,  91,  105,  109).  The  perceptron  is  a  relative  newcomer  to  this 
field, having  first  been  described  by  this  writer  in  1957  (Ref.  78).  Perceptrons 
are  of  interest  because  their  study  appears  to  throw  light  upon  the  biophysics  of 
cognitive  systems;  they  illustrate,  in  rudimentary  form ,  some  of  the  processes 
by  which  organisms,  or  other  suitably  organized  entitites,  may  come  to 
possess  "knowledge"  of  the  physical  world  in  which  they  exist,  and  by  which 
the  knowledge  that  they  possess  can  be  represented  or  reported  when  occasion 
demands.  The  theory  of  the  perceptron  shows  how  such  knowledge  depends 
upon  the  organization  of  the  environment,  as  well  as  on  the  perceiving 
system. 
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At  the  time  that  the  first  perceptron  model  was  proposed,  the 
writer  was  primarily  concerned  with  the  problem  of  memory  storage  in 
biological  systems,  and  particularly  with  finding  a  mechanism  which  would 
account  for  the  "distributed  memory"  and  "equipotentiality"  phenomena  found 
by  Lashley  and  others  (Refs.  48,  49,  95).  It  soon  became  clear  that  the 
problem  of  memory  mechanisms  could  not  be  divorced  from  a  consideration 
of  what  it  is  that  is  remembered,  and  as  a  consequence  the  perceptron  became 
a  model  of  a  more  general  cognitive  system,  concerned  with  both  memory  and 
perception .. 

A  perceptron  consists  of  a  set  of  signal  generating  units  (or 
"neurons")  connected  together  to  form  a  network.  Each  of  these  units,  upon 
receiving  a  suitable  input  signal  (either  from  other  units  in  the  network  or 
from  the  environment;  responds  by  generating  an  output  signal,  which  may 
be  transmitted,  through  connections,  to  a  selected  set  of  receiving  units.  Each 
perceptron  includes  a  sensory  input  (i.e.  ,  a  set  of  units  capable  of  responding 
to  signals  emanating  from  the  environment)  and  one  or  more  output  units,  which 
generate  signals  which  can  be  directly  observed  by  an  experimenter,  or  by  an 
automatic  control  mechanism.  The  logical  properties  of  a  perceptron  are 
defined  by: 

1.  Its  topological  organization  (i.e.,  the  connections  among 

the  signal  units ); 

Z.  A  set  of  signal  propagation  functions,  or  rules  governing 

the  generation  and  transmission  of  signals; 

3.  A  set  of  memory  functions  or  rules  for  modification  of 
the  network  properties  as  a  consequence  of  activity. 


-4- 


A  perceptron  is  never  studied  in  isolation,  but  always  as  part  of  a 
closed  experimental  system,  which  includes  the  perceptron  itself,  a  defined 
environment,  and  a  control  mechanism  or  experimenter  capable  of  applying 
well-defined  rules  for  the  modification,  or  "reinforcement"  of  the  perceptron's 
memory  state.  In  most  analyses,  we  are  not  concerned  with  a  single  percep¬ 
tron,  but  rather  with  the  properties  of  a  class  of  perceptrons,  whose  topolo¬ 
gical  organizations  come  from  some  statistical  distribution.  A  perceptron, 
as  distinct  from  some  other  types  of  brain  models,  or  "nerve  nets",  is  usually 
characterized  by  the  great  freedom  which  is  allowed  in  establishing  its 
connections,  and  the  reliance  which  is  placed  upon  acquired  biases,  rather 
than  built-in  logical  algorithms,  as  determinants  of  its  behavior. 

Because  of  a  common  heritage  in  the  philosophy,  psychology, 
physiology,  and  technology  of  the  last  few  centuries,  there  are  bound  to  be 
similarities  between  the  points  of  view  and  the  basic  assumptions  of  the 
theory  presented  here,  and  of  other  theories.  The  writer  makes  no  claim  to 
uniqueness  in  this  respect.  In  particular,  the  neuron  model  employed  is  a 
direct  descendant  of  that  originally  proposed  by  McCulloch  and  Pitts;  the 
basic  philosophical  approach  has  been  heavily  influenced  by  the  theories  of 
Hebb  and  Hayek  and  the  experimental  findings  of  Lashley;  moreover,  the 
writer's  predilection  for  a  probabilistic  approach  is  shared  with  such  theo¬ 
rists  as  Ashby,  Uttley,  Minsky,  MacKay,  and  von  Neumann,  among  others. 

This  volume  is  divided  into  four  main  sections.  Part  I, 
commencing  with  this  introduction,  attempts  to  review  the  background, 
basic  sources  of  data,  concepts,  and  methodology  to  be  employed  in  the 
study  of  perceptrons.  In  Chapter  Z,  a  brief  review  of  the  main  alternative 
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approaches  to  the  development  of  brain  models  is  presented.  Chapter  3 
considers  the  physiological  and  psychological  criteria  for  a  suitable  model, 
and  attempts  to  evalute  the  empirical  evidence  which  is  available  on  several 
important  issues.  Sufficient  references  to  the  literature  are  included  through¬ 
out  these  chapters  so  that  the  reader  who  requires  additional  background  in 
any  of  the  areas  discussed  can  use  this  as  a  guide  for  further  reading.  Part  I 
concludes  with  Chapter  4,  in  which  basic  definitions  and  some  of  the  notation 
to  be  used  in  later  sections  are  presented.  Parts  II  and  III  are  devoted  to  a 
summary  of  the  established  theoretical  results  obtained  to  date.  In  these 
sections,  the  strategy  will  be  to  present  a  number  of  models  of  increasing 
complexity  and  sophistication,  with  theorems  and  analytic  results  on  each 
model  to  indicate  its  capabilities  and  deficiencies.  Wherever  possible, 
established  mathematical  results  will  be  presented  first,  followed  by  empirical 
evidence  from  simulation  and  hardware  experiments.  Part  II  (Chapters  5 
through  14)  deals  with  the  theory  of  three-layer  series -coupled  perceptrons, 
on  which  most  work  has  been  done  to  date.  These  systems  are  called  "mini¬ 
mal  perceptrons".  Part  III  (Chapters  15  through  ZO)  deals  with  the  theory  of 
multi-layer  and  c ros s -coupled  perceptrons.  where  a  great  deal  still  remains 
to  be  done,  but  where  the  most  provocative  results  have  begun  to  emerge. 

Part  IV  is  concerned  with  more  speculative  models  and  problems  for  future 
analysis.  Of  necessity,  the  final  chapters  become  increasingly  heuristic  in 
character,  as  the  theory  of  perceptrons  is  not  yet  complete,  and  new 
possibilities  are  continually  coming  to  light. 

Part  I  (except  for  the  chapter  on  definitions)  is  entirely  non- 
mathematical .  In  Part  II,  and  most  of  the  remainder  of  the  text,  familiarity 
with  the  elements  of  modern  algebra  and  probability  theory  is  assumed,  and 
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should  be  sufficient  for  most  of  the  material.  In  several  proofs  in  Part  II, 
and  to  a  greater  extent  in  Part  III,  analytic  methods  are  employed,  assuming 
knowledge  of  the  calculus  and  differential  equations;  an  elementary  acquaintance 
with  differential  geometry  would  also  be  useful.  Symbolic  logic  is  not  required 
here,  but  the  student  will  find  it  necessary  for  reading  much  of  the  ancillary 
literature  in  the  field. 

Several  appendices  are  included  which  may  prove  helpful  for 
cross-referencing  equations,  definitions,  and  experimental  designs  which 
are  described  in  different  chapters.  Appendix  A  is  a  list  of  all  symbols  used 
in  a  standard  manner  throughout  the  volume.  Appendix  B  is  a  consolidated 
list  of  theorems  and  corollaries.  Appendix  C  lists  the  principal  equations 
used  in  the  analysis  of  performance,  and  basic  quantitative  functions.  Appendix 
A  contains  a  summary  of  the  experiments  used  for  testing  and  comparing 
different  perceptrons.  These  experiments  are  referred  to  by  number, 
throughout  the  text,  and  are  described  in  detail  as  they  are  first  introduced. 


2.  HISTORICAL  REVIEW  OF  ALTERNATIVE  APPROAC?IES 


2.1  Approaches  to  the  Brain  Model  Problem 


There  are  at  least  two  basic  points,  which  are  fundamental  to  a 
theory  of  brain  functioning,  on  which  most  of  the  present-day  theorists  seem 
to  be  in  agreement.  First  is  the  assumption  that  the  essential  properties  of 
the  brain  are  the  topology  and  the  dynamics  of  impulse -propagation  in  a  net¬ 
work  of  nerve  cells,  or  neurons.  This  has  been  contested  by  a  few  theorists 
who  hold  that  the  individual  cells  and  their  properties  are  less  important  than 
the  bulk  properties  and  electrical  currents  in  the  cortical  medium  as  a  whole 
(c.f.  Kohler,  Ref  45).  The  "neuron  doctrine",  however,  has  now  been 
accepted  with  sufficient  universality  that  it  need  not  be  considered  as  an 
issue  in  this  report  (Bullock,  Ref.  11).  It  will  be  assumed  that  the  essential 
features  of  the  brain  can  be  derived  in  principle  from  a  knowledge  of  the 
connections  and  states  of  the  neurons  which  comprise  it.  Secondly,  there  is 
general  agreement  that  the  information-handling  capabilities  of  biological 
networks  do  not  depend  upon  any  specifically  vitalistic  powers  which  could 
not  be  duplicated  by  man-made  devices.  This  also  has  occasionally  been 
questioned,  even  today,  by  such  neurologists  as  Eccles  (Ref.  18)  who 
advocate  a  dualistic  approach  in  which  the  mind  interacts  with  the  body. 
Nonetheless,  all  currently  known  properties  of  a  nerve  cell  can  be  simulated 
electronically  with  readily  available  devices.  It  is  significant  that  the 
individual  elements,  or  cells,  of  a  nerve  network  have  never  been  demons¬ 
trated  to  possess  any  specifically  psychological  functions,  such  as  "memory", 
"awareness",  or  "intelligence".  Such  properties ,  therefore,  presumably 
reside  in  the  organization  and  functioning  of  the  network  as  a  whole,  rather 
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than  in  its  elenaentary  parts.  In  order  to  understand  how  the  brain  works,  it 
thus  becomes  necessary  to  investigate  the  consequences  of  combining  simple 
neural  elements  in  topological  organizations  analogous  to  that  of  the  brain. 

We  are  therefore  interested  in  the  general  class  of  such  networks,  which 
includes  the  brain  as  a  special  case. 

While  there  is  substantial  agreement  up  to  this  point,  theorists 
are  divided  on  the  question  of  how  closely  the  brain's  methods  of  storage, 
recall,  and  data  processing  resemble  those  practised  in  engineering  today. 

On  the  one  hand,  there  is  the  view  that  the  brain  operates  by  built-in 
algorithmic  methods  analogous  to  those  employed  in  digital  computers,  while 
on  the  other  hand,  there  is  the  view  that  the  brain  operates  by  non -algor ithmic 
methods,  bearing  little  resemblance  to  the  familiar  rules  of  logic  and  mathe¬ 
matics  which  are  built  into  digital  devices  (c.f.  von  Neumann,  Ref.  105).  The 
advocates  of  the  second  position  (this  writer  included)  maintain  that  new  funda¬ 
mental  principles  must  be  discovered  before  it  will  be  possible  to  formulate  an 
adequate  theory  of  brain  mechanisms.  It  is  suggested  that  probabilistic  and 
adaptive  mechanisms  are  particularly  important  here.  This  does  not  mean 
that  the  actual  biological  nervous  system  is  strictly  one  type  of  device  or. 
the  other;  the  issue  concerns  the  matter  of  emphasis,  as  to  whether  the  brain 
is  primarily  a  more  or  less  conventional  computing  mechanism,  in  which 
statistical  or  adaptive  processes  play  an  incidental  and  non-essential  role, 
or  whether  the  brain  is  so  dependent  upon  such  processes  that  a  model  which 
fails  to  take  them  into  account  will  find  itself  unable  to  account  for  psycho¬ 
logical  performance. 
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These  two  points  of  view  are  associated  with  two  basically 
different  procedures  for  studying  the  mechanisms  of  the  brain  and  for  the 
development  of  brain  models.  The  first  procedure  will  be  called  the  iCiio- 
typic  model  approach;  it  amounts  to  the  detailed  logical  design  of  a  special- 
purpose  computer  to  calculate  some  predetermined  "psychological  function" 
such  as  the  result  of  a  recognition  algorithm,  or  a  stimulus  transformation, 
which  is  postulated  as  a  plausible  function  for  a  nerve  net  to  calculate.  The 
physical  properties  of  this  computer  are  then  compared  with  those  of  the 
brain,  in  the  hopes  of  finding  resemblances.  The  second  procedure  will  be 
called  the  genotypic  model  approach.  Instead  of  beginning  with  a  detailed 
description  of  functional  requirements  and  designing  a  specific  physical 
system  to  satisfy  them,  this  approach  begins  with  a  set  of  rules  for  genera¬ 
ting  a  class  of  physical  systems,  and  then  attempts  to  analyse  their  perform¬ 
ance  under  characteristic  e.xperimental  conditions  to  determine  their  common 
functional  properties.  The  results  of  such  experiments  are  then  compared 
with  similar  observations  on  biological  systems,  in  the  hopes  of  finding  a 
behavioral  correspondence.  It  is  the  purpose  of  this  chapter  to  review  the 
historical  development  and  current  status  of  these  two  alternative  "philo¬ 
sophies  of  approach"  to  the  brain  model  problem. 

Z.Z  Monotypic  Models 

In  the  monotypic  model  approach,  the  theorist  generally  begins 
by  defining  as  accurately  as  possible  the  performance  required  from  his 
model.  For  example,  he  may  specify  a  data  processing  operation,  an 
input-output  or  stimulus  -  response  function,  or  a  remembering  and 


regenerating  operation.  In  one  typical  model,  the  system  is  required  to 
normalize  the  size  and  position  of  a  visual  image,  and  to  compare  functions 
of  this  normalized  image  with  certain  stored  quantities  required  for  identifi¬ 
cation  (Ref.  71).  Given  a  description  of  the  required  pe-»-formance  in 
sufficiently  precise  terms,  the  theorist  then  proceeds  to  design  a  computing 
machine  or  control  system  embodying  the  required  function,  generally  limiting 
himself  to  the  use  of  a  set  of  modular  switching  devices  which  are  analogous 
to  biological  neurons  in  their  properties  .  It  is  this  last  constraint  which 
distinguishes  the  nerve  net  theorist  from  any  other  designer  of  special 
purpose  computers  confronted  with  the  same  problem.  It  is  hoped  that  a 
network  which  consists  of  neuron-like  elements,  and  is  capable  of  computing 
the  required  functions,  will  be  found  to  resemble  a  biological  nerve -net  in  its 
organization  and  the  computational  principles  employed. 

While  the  simulation  of  animals,  saints,  and  chessplayers  by 
animated  machines  and  clockwork  devices  goes  back  many  centuries,  the 
idea  of  constructing  such  .devices  out  of  simple  logical  elements  with  neuron¬ 
like  properties  is  a  relatively  recent  one,  and  received  its  first  impetus  from 
two  sources:  First,  Turing's  .paper  "On  Computable  Numbers",  in  1936.  and 
the  subsequent  development  of  stored -program  digital  computers  by  von 
Neumann  and  others  during  the  1940's  (Refs.  IZ,  100)gave  rise  to  an 
impressive  family  of  "universal  automata",  capable  of  executing  programs 
which  would  enable  them  to  perform  any  computation  whatsoever  with  only 
the  simplest  of  logical  devices  being  employed  as  "building  blocks".  Second, 
the  Chicago  group  of  mathematical  biophysicists  which  grew  up  about 
Rashevsky  after  the  publication  of  his  "Mathematical  Biophysics"  in  1938, 


(Ref.  73)  began  to  investigate  the  manner  in  which  "nerve  nets"  consisting  of 
formalized  neurons  and  connections  might  be  made  to  perform  psychological 
functions.  Householder,  Landahl,  Pitts ,  and  others  made  notable  contributions 
to  this  effort  during  the  late  1930's  and  early  1940's  (Refs.  35,  69,  70). 

In  1943,  the  doctrine  and  many  of  the  fundamental  theorems  of  this 
approach  to  nerve  net  theory  were  first  stated  in  explicit  form  by  McCulloch 
and  Pitts,  in  their  well-known  paper  on  "A  Logical  Calculus  of  the  Ideas 
Immanent  in  Nervous  Activity".  The  fundamental  thesis  of  the  McCulloch- 
Pitts  theory  is  that  all  psychological  phenomena  can  be  analyzed  and  understood 
in  terms  of  activity  in  a  network  of  two-state  (all-or-nothing)  logical  devices. 
The  specification  of  such  a  network  and  its  propositional  logic  would,  in  the 
words  of  the  writers,  "contribute  all  that  could  be  achieved"  in  psychology, 
"even  if  the  analysis  were  pushed  to  ultimate  psychic  units  or  'psychons', 
for  a  psychon  can  be  no  less  than  the  activity  of  a  single  neuron.  .  .  The  'all- 
or-none'  law  of  these  activities,  and  the  conformity  of  their  relations  to 
those  of  the  logic  of  propositions,  insure  that  the  relations  of  psychons  are 
those  of  the  two-valued  logic  of  propositions."  (Ref.  57).  Despite  the 
apparent  adherence  to  an  outdated  atomistic  psychological  approach,  there 
is  an  important  contribution  in  the  recognition  that  the  proposed  axiomatic 
representation  of  neural  elements  and  their  properties  permits  strict  logical 
analysis  of  arbitrarily  complicated  networks  of  such  elements,  and  that 
such  networks  are  capable  of  representing  any  logical  proposition  whatever. 

As  von  Neumann  states  in  a  summary  of  the  McCulloch-Pitts  model, 

(Ref.  103)  "The  'functioning'  of  such  a  network  may  be  defined  by  singling 
out  some  of  the  inputs  of  the  entire  system  and  some  of  its  outputs,  and 
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then  describing  what  original  stimuli  on  the  former  are  to  cause  what  ultimate 
stimuli  on  the  latter.  .  .McCulloch  and  Pitts'  important  result  is  that  any- 
functioning  in  this  sense  which  can  be  defined  at  all  logically,  strictly,  and 
unambiguously  in  a  finite  number  of  words  can  also  be  realized  by  such  a 
formal  neural  network." 

A  great  variety  of  subsequent  models  have  made  use  of  this 
axiomatic  representation,  which  we  now  refer  to  as  the  "McCulloch-Pitts 
neuron".  As  stated  in  the  original  paper  (Ref.  57),  the  basic  assumptions  in 
this  representation  are; 

"  1.  The  activity  of  the  neuron  is  an  'all -or-none ' 

process . 

2.  A  certain  fixed  number  of  synapses  must  be 
excited  within  the  period  of  latent  addition  in 
order  to  excite  a  neuron  at  any  time,  and  this 
number  is  independent  of  previous  acitivy  and 
position  on  the  neuron. 

3.  The  only  significant  delay  within  the  nervous 
system  is  synaptic  delay. 

4.  The  activity  of  any  inhibitory  synapse  absolutely 
prevents  excitation  of  the  neuron  at  that  time. 

5.  The  structure  of  the  net  does  not  change  with  time." 
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These  postulates  are  such  as  to  rule  out  memory  except  in  the  form  of 
modifications  of  perpetual  activity  or  circulating  loops  of  impulses  in  the 
network.  Any  non-volatile  memory,  such  that  the  functioning  of  the  network 
at  a  given  time  depends  upon  previous  activity  even  though  a  period  of  total 
inactivity  has  intervened,  is  impossible  in  a  McCulloch-Pitts  network. 
However,  a  McCulloch  Pitts  network  can  always  be  constructed  which  will  em 
body  whatever  input-output  relations  might  be  realized  by  a  system  with 
an  arbitrary  memory  mechanism,  provided  activity  is  allowed  to  persist  in 
the  network. 

Later  writers,  notably  Kleene  (Ref.  43)  have  considered  in 
more  detail  the  kinds  of  events  which  can  be  represented  by  networks  of 
McCulloch-Pitts  neurons.  The  only  important  limitation  is  that  events 
whose  definition  depends  upon  the  choice  of  a  temporal  origin  point,  or 
events  which  extend  infinitely  into  the  past,  may  not  be  representable  by 
outputs  from  finite  networks.  Any  event  which  can  be  described  as  one  of 
a  definite  set  of  possible  input  sequences  over  a  finite  period  of  time  can  be 
represented.  In  particular,  any  events  which  might  conceivably  be  recognized 
by  a  biological  system  can  be  represented  by  outputs  of  networks  of  McCulloch 
Pitts  neurons  . 

In  later  papers  by  Pitts  and  McCulloch  (Ref.  71)  and  by 
Culbertson  (Refs.  16,  17)  specific  automata  designed  to  perform  actual 
"psychological"  functions  such  as  pattern  recognition,  have  been  described. 
Culbertson,  in  particular,  has  carried  out  such  designs  in  explicit  detail  for 
a  large  number  of  interesting  problems.  The  approach  which  he  advocates 
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is  expounded  in  his  1950  work  on  "Consciousness  and  Behavior"  as 
follows: 


"Neuroanatomy  and  neurophysiology  have  not  yet  developed 
far  enough  to  tell  us  the  detailed  interconnections  holding 
within  human  or  animal  nets.  .  .Consequently,  ...  we  cannot 
start  with  specified  nerve  nets  and  then  in  a  straightforward 
way  determine  their  properties.  Instead,  it  is  the  reverse 
problem  w'hich  always  occurs  in  dealing  with  organic  behavior. 

We  are  given  at  best  the  vaguely  defined  propertieis  of  an 
unknown  net  and  from  these  must  determine  what  the  structure 
of  that  net  might  possibly  be.  In  other  words,  we  know,  at 
least  in  a  rough  way,  what  the  net  does  (as  this  appears  in 
the  behavior  of  the  animal  or  man)  and  from  this  information 
we  have  to  figure  out  what  structure  the  net  must  have.  .  .Our 
investigation  passes  through  two  stages.  In  the  first  stage-- 
the  behavioristic  inquiry--we  ignore  the  inner  constituents, 
i.e.  ,  the  nervous  system  and  its  activity,  and  concentrate 
our  attention  instead  on  the  observable  relations  between  the 
stimuli  affecting  the  organism  and  the  responses  to  which 
these  stimuli  give  rise.  .  .This  makes  the  second  stage--the 
functional  inquiry  - -possible .  Here,  as  Northrop  says,  we 
concentrate  our  attention  on  the  inner  (throughput)  consti¬ 
tuents  of  the  system  and  point  out  the  ways  in  which  the 
receptor  cells,  central  cells,  and  effector  cells  could  be 
interconnected  so  that  the  input  and  output  relations.  .  .would 
be  those  discovered  in  stage  1." 

While  such  a  program  can  hardly  be  criticized  on  logical  grounds, 
it  appears  pragmatically  to  have  fallen  short  of  the  proposed  goals.  Starting 
rather  suddenly,  with  the  development  of  automata  theory  in  the  late  1930's, 
the  ready  applicability  of  symbolic  logic  brought  this  approach  to  early 
mathematical  sophistication.  After  the  first  flood  of  proposed  models, 
further  progress  has  been  disappointingly  trivial,  and  returns  seem  to  be 
diminishing  rapidly.  The  promised  biological  "explanations"  have  been 
particularly  lacking.  In  this  writer's  opinion,  there  are  at  least  five  main 
reasons  for  this : 
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(1) 


There  is  a  lack  of  sufficiently  well  defined  psychological 
functions  as  a  starting  point.  The  approach  requires 
essentially  full  knowledge  of  input-output  relations  for  the 
behavior  of  an  organism,  and  such  knowledge  is  not 
available  for  any  biological  species  . 

(2)  Constructed  solutions  generally  show  poor  correspondence 
to  known  conditions  of  neuroanatomy  and  neuroeconomy; 
the  numbers  of  neurons  required  often  exceed  those  in 
biological  nervous  systems,  and  the  logical  organization 
generally  requires  a  precision  of  connections  which 
appears  to  be  absent  in  the  brain.  In  some  cases,  a 
single  misconnection  would  be  sufficient  to  make  the 
system  inoperable 

(3)  The  models  fail  to  yield  general  laws  of  organization. 

A  monotypic  model  is  in  general  overdetermined, 
corresponding  at  best  to  a  biological  phenotype, 
rather  than  a  species  as  a  whole;  its  specification  in 
the  form  of  a  detailed  "wiring  diagram"  frequently 
misses  essentials  in  a  plethora  of  detail.  Unique 
solutions  for  the  proposed  functions  are  generally 
lacking  and  an  enormous  variety  of  models  can  be 
generated  which  appear  to  solve  the  same  problem 
equally  well.  Therefore,  unless  the  system  is  actually 
tested  against  its  biological  counterpart,  nothing  is 
gained  by  a  detailed  construction  of  the  model  except  a 
further  confirmation  of  an  existence  theorem  which  is 
already  well  established. 
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(4)  The  models  lack  predictive  value.  Once  a  particular 
model  has  been  proposed,  further  analysis  can  reveal 
little  that  is  not  included  in  the  functional  description 
with  which  we  began. 

(5)  The  models  are  not  biologically  testable  in  detail. 

Specific  connections  cannot  be  traced  with  sufficient 
precision  in  nervous  tissue  to  say  whether  or  not  a 
particular  wiring  diagram  is  exactly  realized.  Conse¬ 
quently,  the  models  are  fated  to  remain  purely  specu¬ 
lative  unless  histological  techniques  are  improved  to 

a  highly  improbable  degree. 

In  the  foregoing,  we  have  concentrated  on  the  line  of  models 
which  have  attempted  to  represent  the  brain  as  a  symbolic  logic  calculator, 
in  which  events  of  the  outside  world  are  represented  by  the  firing  or  non¬ 
firing  of  particular  neurons.  It  is  in  these  models  that  rigorous  mathematical 
treatment  has  been  most  successfully  achieved.  Not  all  monotypic  models 
are  of  this  variety,  however.  Field  theorists  such  as  Kohler  have  taken 
exception  to  the  idea  that  psychological  phenomena  can  be  represented  in 
this  fashion.  Kohler,  arguing  for  an  isomorphic  representation  of  perceptual 
phenomena,  asks  (Ref.  46):  "How  can  a  cortical  process  such  as  that  of  a 
square  give  rise  to  an  apparition  with  certain  structural  characteristics,  if 
these  characteristics  are  not  present  in  the  process  itself?  According  to 
Dr.  McCulloch,  this  is  actually  the  case.  But  if  we  follow  the  example  of 
physics,  we  shall  hesitate  to  accept  his  view.  In  physics,  the  structural 
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characteristics  of  a  state  of  affairs  are  given  by  the  structural  properties 
of  the  factors  which  determine  that  state  of  affairs,  .  .  Situations  in  physics 
which  depend  upon  the  spatial  distribution  of  given  conditions  never  have 
more,  and  more  specific,  structural  characteristics  than  are  contained  in 
the  conditions”.  While  Kdhler's  own  model  is  not  generaly  considered 
plausible  today,  his  criticism  is  a  significant  one,  and  a  number  of  theorists, 
such  as  Lashley  (Ref.  50)MacKay  (Refs.  55,  56)  and  Green  (Ref .  28)  have 
been  concerned  with  possible  forms  of  representation  of  perceptual  informa¬ 
tion  which  would  preserve  the  intrinsic  structural  features  of  the  perceived 
event  rather  than  merely  assigning  an  arbitrary  symbol  to  it. 

The  main  line  of  monotypic  models,  although  failing  to  provide 
a  satisfactory  brain  model,  has  left  us  a  number  of  important  analytic  tools 
and  concepts,  including  the  McCulloch-Pitts  neuron,  and  the  theorems 
concerning  the  existence  of  networks  representing  arbitrary  functions.  For 
the  actual  design  of  plausible  organizations,  however,  the  genotypic  approach 
appears  to  hold  more  promise, 

2.3  Genotypic  Models 


In  the  monotypic  approach,  the  properties  of  the  components, 
or  neurons,  which  comprise  the  networks  are  fully  specified  axiomatically , 
and  the  topology  of  the  network  is  fully  specified  as  well.  In  the  genotypic 
approach,  the  properties  of  the  components  may  be  fully  specified,  but  the 
organization  of  the  network  is  specified  only  in  part,  by  constraints  and 
probability  distributions  which  generate  a  class  of  systems  rather  chan  a 
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specific  design.  The  genotypic  approach,  then,  is  concerned  with  the 
properties  of  systems  which  conform  to  designated  laws  of  organization, 
rather  than  with  the  logical  function  realized  by  a  particular  system. 

This  difference  in  approach  leads  to  important  differences  in 
the  types  of  models  which  are  generated,  and  the  kinds  of  things  which  can 
be  done  with  them.  In  the  case  of  monotypic  models,  for  example,  the 
propositional  calculus  is  applicable  and  probability  theory  is  poorly  suited 
to  the  analysis  of  performance,  since  a  single  fully  deterministic  system  is 
under  consideration  which  either  does  or  does  not  satisfy  the  required 
functional  equations.  In  dealing  with  genotypic  models,  on  the  other  hand, 
sumbolic  logic  is  apt  to  prove  cumbersome  or  totally  inapplicable  (even 
though,  in  principle,  any  particular  system  which  is  generated  might  be 
expressed  by  a  set  of  logical  propositions).  In  the  analysis  of  such  models, 
the  chief  interest  is  in  the  properties  of  the  class  of  systems  which  is 
generated  by  particular  rules  of  organization,  and  these  properties  are 
best  described  statistically.  Probability  theory  therefore  plays  a  promi¬ 
nent  part  in  this  approach.  A  second  major  difference  is  in  the  method  of 
determining  functional  characteristics  of  the  models.  In  the  monotypic 
approach,  the  functional  properties  are  generally  postulated  as  a  starting 
point.  In  the  genotypic  approach,  they  are  the  end-objective  of  analysis, 
and  the  physical  system  itself  (or  the  statistical  properties  of  the  class  of 
systems)  constitutes  the  starting  point.  This  means  that  psychological 
functions  need  not  be  determined  in  full  detail  before  setting  out  to  construct 
a  model,  and,  indeed,  it  is  hoped  that  such  models  may  help  in  answering 
open  psychological  questions. 
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While  the  monotypic  approach  arose  rather  suddenly  with  the 
advent  of  modern  computers  and  control  system  theory,  and  rapidly  advanced 
to  a  high  level  of  mathematical  sophistication,  the  genotypic  approach  has 
been  much  more  gradual  in  its  development,  and  has  not  yet  developed  all 
of  the  mathematical  tools  required  to  deal  adequately  with  its  problems. 

The  genotypic  models  have  been  influenced  less  by  the  engineering  sciences, 
and  more  by  physiology  and  neuroanatomy.  The  descriptive  anatomy  of  the 
nineteenth  century  laid  the  groundwork  for  modern  studies  of  localization  of 
function  in  the  brain,  and  neurologists  such  as  John  Hughlings  Jackson  noted 
the  apparent  plasticity  of  the  system  --  the  ability  of  neighboring  regions  to 
take  over  the  function  of  damaged  areas.  Pavlov  and  others  speculated  about 
possible  mechanisms  for  adaptive  modification  of  the  central  nervous  system 
in  the  early  part  of  this  century,  and  various  hypotheses  for  the  deposition  of 
"memory  traces"  were  of  interest  to  psychologists  and  physiologists  alike. 
The  doctrine  of  equipotentiality ,  propounded  by  Lashley  (Ref.  49),  went  even 
further  in  claiming  complete  interchangeability  of  most  parts  of  the  cerebral 
cortex,  and  evidence  for  "distributed  memory"  which  suggested  that  "traces" 
must  be  more  or  less  uniformly  dispersed  throughout  the  cortical  tissue 
began  to  accumulate.  All  of  this  neurological  evidence  engendered  a  picture 
of  the  brain  as  a  relatively  undifferentiated  structure,  capable  of  undergoing 
radical  reorganization  by  means  of  unspecified  adaptive  mechanisms,  and 
showing  only  gross  anatomical  equivalence  from  one  individual  to  another. 
While  recent  work  on  localization  (Refs.  51,  65,  66,  94,  108)  has  shown 
some  surprisingly  precise  mapping  of  functions,  modern  morphological 
investigations  (Refs.  8,  52,  93)  have  borne  out  the  apparently  statistical 
organization  of  the  "fine  structure"  of  neurons  and  their  interconnections  . 

It  now  seems  reasonable  to  suppose  that  while  there  are  many  constraints 
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on  the  organization  of  neurons  in  the  brain,  which  are  undoubtedly  essential 
to  the  system's  functioning,  these  constraints  take  the  form  of  prohibitions, 
biases,  and  directional  preferences,  rather  than  a  specific  blueprint  which 
must  be  followed  to  the  last  detail.  In  order  words,  there  are  enormous 
numbers  of  functionally  equivalent  systems,  all  obeying  the  same  rules  of 
organization,  and  all  equally  likely  to  be  generated  by  the  genetic  mechanisms 
of  a  particular  species. 

While  the  neurologists  mentioned  above  had  a  great  deal  to  say 
about  the  observed  and  hypothetical  organization  of  the  brain,  they  were  not 
concerned  with  the  construction  of  models  in  the  sense  of  detailed  theoretical 
systems  from  which  precise  deductions  could  be  made.  Psychologists  and 
philosophers,  more  willing  to  indulge  in  speculation,  were  the  first  to  attempt 
detailed  conjectures  on  the  maturation  of  psychological  functions  in  systems 
which  might  justifiably  be  called  "brain  models".  Hebb  (Ref.  33)  and  Hayek 
(Ref.  32),  following  the  tradition  of  James  Stuart  Mill  and  Helmholtz,  have 
attempted  to  show  how  an  organism  can  acquire  perceptual  capabilities 
through  a  maturational  process.  F'or  Hayek,  the  recognition  of  the  attri¬ 
butes  of  a  stimulus  is  essentialy  a  problem  in  classification,  and  his  point 
of  view  has  inspired  Uttley  (Refs.  101,  102)  to  design  a  type  of  classifying- 
automaton  which  attempts  to  translate  the  approach  into  more  rigorous 
mathematical  form.  Hebb's  model  is  more  detailed  in  its  biological 
description,  and  suggests  a  process  by  which  neurons  which  are  frequently 
activated  together  become  linked  into  functional  organizations  called 
"cell  assemblies"  and  "phase  sequences"  which,,  when  stimulated,  corres¬ 
pond  to  the  evocation  of  an  elementary  idea  or  percept.  While  Hebb's 
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work  is  far  more  complete  in  its  specification  of  a  "model”  than  most 
preceding  suggestions  along  this  line,  it  is  still  too  programmatic  and  too 
loose  in  its  definitions  to  permit  a  rigorous  testing  of  hypotheses.  It  should 
be  considered  more  as  a  description  of  what  a  satisfactory  model  might 
ultimately  look  like  than  as  a  fully  formulated  model  in  its  own  right.  None¬ 
theless,  it  comes  sufficiently  close  to  a  detailed  specification  so  that 
Rochester  and  associates,  using  an  IBM  computer,  were  able  to  propose 
enough  of  the  missing  detail  to  put  the  cell  assembly  hypothesis  to  an 
empirical  test  (Ref.  77).  Unfortunately,  with  a  theory  so  loosely  specified, 
the  inconclusive  results  of  the  IBM  experiments  carry  little  weight  in 
evaluating  Hebb's  original  system.  Milner,  in  a  recent  paper  (Ref.  58)  has 
attempted  to  update  the  Hebb  theory,  and  it  may  be  that  his  m.odel  can  be 
more  readily  translated  into  analyzable  form,  although  this  has  not  yet  been 
done  . 


It  is  interesting  that  one  of  the  first  applications  of  probability 
theory  to  brain  models  is  due  to  L-andahl,  McCulloch,  and  Pitts,  appearing 
in  1943  along  with  the  McCulloch-Pitts  symbolic  logic  model  (Ref.  47).  In 
this  paper,  the  topology  of  the  network  is  still  assumed  to  be  a  strictly 
deterministic,  fully  known  organization,  but  impulses  are  assumed  to  be 
propagated  with  known  frequencies  but  with  uncertainties  in  their  precise 
timing.  A  theorem  is  stated  which  permits  the  substitution  of  frequencies 
for  symbols  in  the  logical  equations  of  the  network,  in  order  to  obtain  the 
expected  frequency  with  which  different  cells  will  respond.  This  statistical 
treatment  is  related  to  the  work  of  von  Neumann  (Ref.  104)  on  the  proba¬ 
bility  of  error  in  networks  with  fallible  components. 
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The  first  systematic  attempt  to  develop  a  family  of  statistically 
organized  networks,  and  to  analyze  these  in  a  rigorous  fashion  by  means  of 
a  genotypic  approach  seems  to  have  been  due  to  Shimbel  and  Rapoport,  in 
1948  (Ref.  92).  Starting  with  an  axiomatic  representation  of  neurons  and 
connections,  similar  to  that  of  McCulloch  and  Pitts,  a  network  is  character¬ 
ized  by  probability  distributions  for  thresholds,  synaptic  types,  and  origins 
of  connections.  A  general  equation  is  then  developed  for  the  probability  that  '  — 

a  neuron  at  a  specified  location  will  fire  at  a  specified  time,  as  a  function  of 

p.  ■ 

preceding  activity  and  parameters  of  the  net.'"  This  is  applied  to  a  nunqber  of 

^  .... 

specific  classes  of  networks  to  determine  the  possibility  of  steady- state  ■ ...  . 

activity,  and  changes  in  the  firing  distribution  with  time.  This  work  is/a^ 
forerunner  of  a  number  of  stability  studies  (e.g.,  Allanson.  Ref,  2)  wKi'ch'' ‘ ‘k 


are  still  of  interest . 


The  use  of  a  digital  computer  by  Rochester  and  associates  was 
mentioned  above  in  connection  with  Hebb's  model.  Simulation  of  a  statisticali'y ' 
connected  network  to  investigate  possible  learning  capabilities  was  first 
carried  out  successfully  by  Farley  and  Clark  in  1954  (Ref.  10).  Although 
mathematical  analysis  was  not  attempted  in  either  the  Farley-Clark  or  the 
Rochester  models,  they  illustrate  a  convenient  method  of  axiomatizing  a 
network  (by  means  of  a  computer  program)  to  a  degree  which  makes  the 
investigation  of  hypotheses  possible.  While  none  of  these  experiments  led 
to  very  sophisticated  systems,  they  are  of  considerable  historical  interest, 
and  the  mechanism  for  pattern  generalization  proposed  by  Clark  and  Farley 
(Ref.  15)  is  essentially  identical  to  that  found  in  simple  perceptrons. 
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Statistical  models  of  various  types  have  been  proposed  during  the 
last  decade.  In  particular,  the  models  of  Beurle,  Taylor,  and  Uttley  (Refs.  6, 
99,  101,  lOZ)  are  of  interest  as  attempts  to  analyze  models  with  a  clear 
resemblance  to  the  organization  of  a  primitive  nervous  system,  with  receptors, 
associative  elements,  and  output  or  motor  neurons.  Moreover,,  in  some  of 
these  models,  environments  of  sufficient  complexity  to  permit  the  repre¬ 
sentation  of  visual  and  temporal  patterns  (albeit  of  a  very  primitive  type) 
are  included  in  the  analysis.  Minsky  (Ref.  59)  has  also  devised  and  analyzed 
several  models  capable  of  learning  responses  to  simple  stimuli. 

A  contribution  of  considerable  methodological  significance  was 
Ashby's  "Design  for  a  Brain",  in  1952  (Ref.  3).  While  Ashby's  work  (despite 
its  title)  does  not  specify  an  actual  brain  model  in  our  present  sense,  it 
develops  the  rationale  for  an  analysis  of  closed  systems  which  must  include 
the  environment  as  well  as  the  responding  organism  and  rules  of  interaction 
as  the  object  of  study.  Ashby's  fields  of  variables  correspond  closely  to 
our  concept  of  "experimental  systems"  which  will  be  defined  in  Chapter  4. 

In  addition  to  his  conceptual  contribution,  which  is  concerned  with  the 
general  approach  to  be  used  rather  than  with  a  specific  model,  Ashby  has 
demonstrated  in  a  number  of  experiments  how  statistical  mechanisms  can 
yield  adaptive  behavior  in  an  organism. 

While  the  genotypic  approach  has  found  favor  among  many 
biologists,  it  is  by  no  means  universally  accepted.  A  typical  criticism  is 
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voiced  by  Sutherland  (Ref.  97)  in  connection  with  Hebb's  system: 


"When  Hebb's  theory  was  first  put  forward,  it  was  hailed 
as  showing  how  it  might  be  possible  to  account  for  behavior 
in  terms  of  plausible  neurophysiological  mechanisms.  .  . 
However,  a  moment's  reflection  shows  that,  if  he  is  right, 
what  he  has  really  succeded  in  doing  is  to  demonstrate 
the  utter  impossibility  of  giving  detailed  neurophysiological 
mechanisms  for  explaining  psychological  or  behavioral 
findings.  According  to  Hebb  the  precise  circuits  used  in 
the  brain  for  the  classification  of  a  particular  shape  will 
vary  from  individual  to  individual  with  chance  variation 
in  nerve  connectivity  determined  by  genetic  and  matura- 
tional  factors.  .  .  Different  individuals  will  achieve  the 
same  end  result  in  behavior  by  very  different  neurological 
circuits.  .  .  If  Hebb's  general  system  is  right,  it  precludes 
the  possibility  of  every  making  detailed  predictions  about 
behavior  from  a  detailed  model  of  the  system  underlying 
behavior." 


While  objections  such  as  this  seem  to  stem  from  a  misunderstanding 
of  the  possibility  of  obtaining  seemingly  deterministic  phenomena  from  a 
statistical  substrate  (as  in  statistical  mechanics)  the  above  argument  is  bols¬ 
tered  by  many  findings  which  suggest  complicated  hereditary  mechanisms 
for  the  analysis  of  stimuli  in  "instinctive"  behavior.  The  work  of  Sperry 
and  Lettvin  has  already  been  cited  in  connection  with  the  mechanisms  for 
precise  localization  of  connections  which  seem  to  exist  in  the  brain.  Our 
conclusion  is  that  the  biological  system  must  employ  some  mixture  of 
specific  connection  mechanisms  and  statistically  determined  structures; 
just  how  much  constraint  is  present  in  the  genetic  constitution  of  the  brain  is 
an  open  question. 
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On  most  of  the  specific  points  of  criticism  raised  in  connection 
with  monotypic  models,  the  genotypic  approach  seems  to  fare  much  better. 
Detailed  psychological  functions  are  not  required  as  a  starting  point.  Detailed 
physiological  knowledge  of  the  brain  would  be  helpful,  but  even  a  rough  para¬ 
metric  description  enables  us  to  start  off  in  the  right  direction,  and  present 
models  have  a  considerable  way  to  go  before  they  have  assimilated  all  of  the 
physiological  data  which  are  available. 

Since  this  approach  begins  with  the  physical  model  rather  than  the 
functions  which  must  be  performed,  it  is  easy  to  guarantee  its  conformity  in 
size  and  organization  to  the  general  characteristics  of  a  biological  system. 
Most  important  is  the  fact  that  this  approach  appears  to  be  yielding  results  of 
increasing  significance  and  interest,  and  the  models  frequently  suggest 
progressive  lines  of  development  from  simple  first  approximations  to  more 
sophisticated  systems.  In  the  application  of  the  genotypic  approach  to  per- 
ceptrons,  a  number  of  laws  of  considerable  generality  have  been  discovered, 
as  will  be  seen  in  subsequent  chapters. 

2.4  Position  of  the  Present  Theory 

The  groundwork  of  perceptron  theory  was  laid  in  1957,  and 
subsequent  studies  by  Rosenblatt,  Joseph,  and  others  have  considered  a 
large  number  of  models  with  different  properties  (Refs..  7,  30,  31,  40, 

41,  76,  79,  80,  81,  82,  84,  85,  86).  Perceptrons  are  genotypic  models, 
with  a  memory  mechanism  which  permits  them  to  learn  responses  to 
stimuli  in  various  types  of  experiments.  In  each  case,  the  object  of 
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analysis  is  an  experimental  system  which  includes  the  perceptron,  a  defined 
environment,  and  a  training  procedure  or  agency.  Results  of  such  analyses 
can  then  be  compared  with  results  of  comparable  experiments  on  human  or 
animal  subjects  to  determine  the  functional  correspondence  and  weaknesses 
of  the  model.  A  number  of  specific  psychological  tasks  and  criteria,  which 
will  be  discussed  in  the  following  chapter,  are  used  for  the  comparison  of 
different  systems. 

Perceptrons  are  not  intended  to  serve  as  detailed  copies  of  any 
actual  nervous  system.  They  are  simplified  networks,  designed  to  permit 
the  study  of  lawful  relationships  between  the  organization  of  a  nerve  net,  the 
organization  of  its  environment,  and  the  "psychological”  performances  of  which 
the  network  is  capable.  Perceptrons  might  actually  correspond  to  parts  of 
more  extended  networks  in  biological  systems;  in  this  case,  the  results 
obtained  will  be  directly  applicable.  More  likely,  they  represent  extreme 
simplifications  of  the  central  nervous  system,  in  which  some  properties  are 
exaggerated,  others  suppressed.  In  this  case,  successive  perturbations  and 
refinements  of  the  system  may  yield  a  closer  approximation. 

The  main  strength  of  this  approach  is  that  it  permits  meaningful 
questions  to  be  asked  and  answered  about  particular  types  of  organization, 
hypothetical  memory  mechanism.s,  and  neuron  models.  When  exact 
analytic  answers  are  unobtainable,  experimental  methods,  either  with 
digital  simulation  or  hardware  models,  are  employed.  The  model  is  not 
a  terminal  result,  but  a  starting  point  for  exploratory  analysis  of  its 
behavior . 
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PHYSIOLOGICAL  AND  PSYCHOLOGICAL  CONSIDEILA.TIONS 


3  . 


Lu  the  last  chapter,  a  methodological  doctrine  was  proposed, 
which  undertakes  to  evaluate  classes  of  brainlike  systems  by  comparing 
their  performance  with  that  of  biological  subjects  in  behavioral  experi¬ 
ments;  by  gradually  increasing  the  sophistication  and  varying  the  axio¬ 
matic  constraints  which  define  the  experimental  systems,  it  is  hoped  that 
models  which  closely  resemble  the  biological  prototype  can  ultimately  be 
achieved.  In  this  chapter,  the  desiderata  for  a  satisfactory  brain  model 
are  considered  in  more  detail,  from  the  standpoint  of  physiology  and 
psychology.  What  are  the  parametric  constraints,  functional  properties, 
and  performance  criteria  which  must  be  met,  in  order  to  achieve  a  model 
which  is  a  plausible  representation  of  the  brain? 

The  following  discussion  comes  under  three  main  headings: 

(1)  established  fundamentals;  (2)  current  issues;  and  (3)  the  design  of 
experimental  tests  of  performance.  It  is  not  our  purpose  to  review  all  of 
the  relevant  background  in  biology  and  psychology,  but  rather  to  highlight 
those  points  which  bear  most  directly  upon  the  present  undertaking,  and 
to  suggest  certain  areas  in  which  investigations  might  provide  decisive 
evidence  for  or  against  some  of  the  models  which  we  shall  propose.  It 
will  be  noted  that  no  attempt  has  been  made  to  distinguish  specifically 
"psychological"  or  specifically  "physiological”  problems  in  the  following 
sections.  Such  distinctions  are  not  only  arbitrary  in  a  number  of  the 
cases  considered,  but  also  tend  to  obscure  the  fact  that  we  are  interested 
in  all  of  these  problems  because  of  their  relevance  to  brain  models,  rather 
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than  to  psychology  or  physiology  per  se.  In  this  discussion,  attention 
will  be  concentrated  on  the  level  of  complexity  which  seems  most  commen¬ 
surate  with  that  of  the  proposed  models.  Psychological  material  on  psycho - 
neuroses,  or  on  attitude  formation,  for  example,  while  it  might  be  brought 
to  bear  on  the  evaluation  of  some  future  models,  is  hardly  likely  to  be 
relevant  at  this  time.  On  the  physiological  side,  we  are  chiefly  concerned 
with  the  overall  organization  of  the  nervous  system,  its  microstructure, 
and  conditions  for  impulse  transmissions;  we  are  less  concerned  with 
details  of  neuroanatomy  and  neurochemistry,  although  such  data  may 
become  important  in  more  sophisticated  models,  where  a  closer  correlation 
with  the  biological  system  is  sought. 


3.1  Established  Fundamentals 


3.1.1  Neuron  Doctrine  and  Nerve  Impulses  : 


It  was  only  during  the  first  decade  of  this  century  that  a  strong 
case  was  developed  for  regarding  the  neuron  as  the  basic  anatomical  unit 
of  the  nervous  system.  The  demonstration  that  this  is  the  case  rests  largely 
upon  the  work  of  Ramon  y  Cajal  (Ref.  14).  Since  Cajal's  time,  a  great  variety 
of  neurons,  differing  in  size,  numbers  of  dendritic  and  axonal  processes,  and 
the  distribution  of  these,  have  been  described  by  neuroanatomists  (Refs.  8, 

5Z,  93).  Today  it  is  generally  accepted  that  in  virtually  all  biological  species, 
the  nervous  system  consists  of  a  network  of  neurons,  each  consisting  of  a 
cell  body  with  one  or  more  afferent  (incoming)  processes,  or  dendrites,  and 
one  or  more  efferent  (outgoing)  processes,  or  a.xons  .  The  axons  branch  into 
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small  fibers  which  may  make  contact  with,  but  remain  separate  from  the 
surface  membrane  of  cells  or  dendrites  upon  which  they  terminate.  Neurons 
are  generally  divided  into  three  classes:  (1)  sensory  neurons,  which  generate 
•signals  in  response  to  energy  applied  to  sensory  transducers,  such  as  photo¬ 
receptors  or  pressure  sensitive  corpuscles;  (Z)  motor  neurons,  (or  effector 
neurons )  which  transmit  signals  to  muscles  or  glands  and  directly  control 
their  activity;  (3)  internuncial  neurons,  (or  associative  neurons)  which  form 
a  network  connecting  sensory  and  motor  neurons  to  one  another.  Tlie  brain, 
•^or  central  nervous  system,  is  made  up  almost  entirely  of  neurons  of  this 
last  type . 

The  actual  signals  carried  by  these  neurons  may  take  one  of 
several  forms.  Until  recently,  it  was  supposed  that  all  information  in  the 
nervous  system  was  represented  by  a  code  of  all-or-nothing  impulses, 
corresponding  to  on-off  states  of  the  neurons.  A  sufficient  input  signal  was 
supposed  to  trigger  the  receiving  cell  directly  into  emitting  a  spike  potential, 
which  was  transmitted  without  decrement  from  the  receiving  region  of  the 
dendrites  to  the  cell  body,  and  out  along  the  axon  to  the  terminal  endbulbs, 
where  it  might  or  might  not  succeed  in  triggering  later  cells  in  the  network. 

In  a  recent  review  (Ref.  11)  Bullock  has  pointed  out  that  this  view  has  been 
largely  supplanted  by  a  far  more  complicated  picture.  While  it  is  true  that 
the  transmission  of  signals  over  long  distances  is  generally  accomplished 
by  means  of  all-or-nothing  spike  propagation  along  the  axons  of  nerve  cells, 
the  spike  impulse  is  not  a  direct  response  to  impulses  which  arrive  at  the 
dendrites,  and  may  originate  at  a  point  wliich  is  separated  by  a  considerable 


distance  from  the  site  at  which  incoming  impulses  are  received.  Essentially, 
the  currently  accepted  concept  is  that  the  dendritic  structure  and  cell  body 
jointly  act  as  an  integrating  system,  in  which  a  series  of  incoming  signals 
interact  to  establish  a  pre -firing  state  in  a  region  at  the  base  of  the  axon, 
from  which  impulses  originate.  If  this  pre-firing  state  reaches  a  threshold 
leve]  (presumably  measured  by  membrane  depolarization)  at  a  point  within 
the  critical  region,  a  spike  potential  is  initiated,  and  spreads  without  decre¬ 
ment  along  the  axon.  The  interactions  which  may  occur  in  the  cell  body  and 
dendrites,  however,  involve  potential  fields  in  which  the  effects  of  impulses 
received  at  a  given  point  spread  over  the  surrounding  membrane  surface  in 
a  decrementing  fashion.  These  effects  may  be  graded  in  intensity,  depending 
on  frequency  of  impulses  received,  and  the  state  of  the  receiving  membrane 
at  the  time.  Successions  of  impulses  arriving  at  the  same  synapse  can 
sometimes  cause  an  increase  in  the  sensitivity  of  the  receiving  membrane 
(facilitation)  and  can  sometimes  cause  a  progressive  diminution  in  sensitivity 
(Ref,  11).  There  is  evidence  to  suggest  that  different  local  patches  of  surface 
membrane  are  differently  specialized,  and  respond  in  different  ways  to 
impulses  received,  even  within  the  same  neuron.  Some  of  these  regions 
appear  to  act  as  sources  of  internally  generated  signals,  which  may  lead 
to  spontaneous  activity  of  the  neuron,  and  the  emission  of  spike  impulses 
without  any  input  signals  from  outside  the  cell. 

Two  main  t^rpes  of  synapses  are  recognized:  excitatory  and 
inhibitory.  It  is  generally  assumed,  although  it  has  not  been  proven,  that 
a  single  neuron  is  either  all  excitatory  or  all  inhibitory,  in  its  effect  upon 
post-synaptic  cells.  It  remains  possible,  however,  that  the  individual 


synaptic  endings  are  specialized,  some  of  them  releasing  a  depolarizing 
transmitter  substance  (excitatory  endings)  while  others  release  a  hyper- 
polarizing  substance  (inhibitory  endings).  A  single  synapse,  so  far  as 
is  known,  remains  either  excitatory  or  inhibitory,  and  is  incapable  of 
changing  from  one  to  the  other. 

The  nerve  impulse  itself  is  a  basically  non-linear  response  to 
stimulation.  It  is  supported  by  energy  reserves  of  the  axon  by  which  it  is 
transmitted,  rather  than  by  a  propagation  of  energy  from  the  sources  of 
excitation.  The  nerve  impulse  is  manifested  by  a  moving  zone  of  electrical 
depolarization  of  the  surface  membrane  of  the  neuron,  the  exterior  of  which 
is  normally  70  to  100  millivolts  positive  relative  to  the  interior.  This  zone 
tends  to  spread  along  the  axon  due  to  ionic  currents  which  tend  to  break 
down  the  potential  difference  between  the  interior  and  exterior  of  the 
neuron,  until  tlie  membrane  is  repolarized  by  metabolic  processes  (see 
Eccles,  Refs.  18,  19  ).  The  resulting  "spike  potential"  takes  the  form  of 
an  electrically  negative  impulse  (measured  relative  to  the  normal  surface 
potential  of  the  membrane)  which  propagates  down  the  fiber  with  an  average 
velocity  of  about  10  to  100  meters  per  second,  depending  on  the  diameter 
of  the  fibers  (c.  f.  ,  Brink,  Ref,  9). 

The  arrival  of  a  single  (excitatory)  impulse  gives  rise  to  a 
partial  depolarization  of  the  post-synaptic  membrane  surface,  which 
spreads  over  an  appreciable  area,  and  decays  exponentially  with  time. 

This  is  called  a  local  excitatory  state  (1  ,  e  .  s  .  ) .  The  1  .  e  .  s  .  due  to 
successive  impulses  is  (approximately)  additive .  Several  impulses 
arriving  in  sufficiently  close  succession  may  thus  combine  to  touch  off 


-33- 


an  impulse  in  the  receiving  neuron  if  the  local  excitatory  state  at  the  base 
of  the  axon  achieves  the  threshold  level.  This  phenomenon  is  called 
temporal  summation.  Similarly,  impulses  which  arrive  at  different  points 
on  the  cell  body  or  on  the  dedrites  may  combine  by  spatial  summation  to 
trigger  an  impulse  if  the  l.e.s.  induced  at  the  base  of  the  axon  is  strong 
enough . 


The  passage  of  an  impulse  in  a  given  cell  is  followed  by  an 
absolute  refractory  period  during  which  the  cell  cannot  be  fired  again, 
regardless  of  the  level  of  input  activity.  This  is  equivalent  to  an  infinite 
threshold  during  this  period.  The  spike  potential  and  absolute  refractory 
period  last  about  1  millisecond.  Finally,  there  is  a  relative  refractory 
period  which  may  last  for  many  milliseconds  after  the  initial  impulse. 
During  this  time,  the  threshold  gradually  returns  to  normal,  and  may 
even  fall  to  somewhat  below  its  normal  level  for  a  time.  While  the 
response  of  a  cell  to  a  single  momentary  stimulus,  such  as  an  electrical 
pulse,  is  markedly  non-linear  (the  amplitude  of  the  generated  impulse 
being  quite  independent  of  the  amplitude  of  the  triggering  signal)  the 
effect  of  a  sustained  excitatory  signal,  in  many  cases,  is  to  evoke  a 
volley  of  output  spikes,  the  frequency  of  which  may  be  roughly  propor¬ 
tional  to  the  intensity  of  the  stimulus  over  a  wide  range.  This  is  parti¬ 
cularly  true  of  sensory  neurons,  where  the  frequency  of  firing  may  be 
used  to  determine  the  intensity  of  the  stimulus  energy  with  considerable 
accuracy. 
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The  general  picture  of  the  nervous  system,  then,  is  one  of  a 
large  set  of  signal  generators,  each  having  one  or  more  outputs,  on  which 
nerve  impulses  may  appear.  These  impulses  may  vary  in  frequency,  and 
to  some  extent  in  amplitude,  but  seem  to  carry  information  mainly  in  a 
pulse-coded  form.  The  signal  generators  themselves  are  decision  elements 
of  a  most  intricate  type;  each  one  makes  its  decision  to  initiate  an  output 
impulse  according  to  a  complicated  function  of  the  series  of  signals  received 
at  each  of  its  synapses  or  receptor  areas,  as  well  as  its  own  internal  state. 
In  a  brain  model,  a  neuron  of  this  complexity  would  tend  to  make  the  system 
unintelligible  and  unmanageable  with  the  analytic  and  mathematical  tools 
at  our  disposal.  Simplifications  will  therefore  be  introduced,  as  in  the 
manner  of  the  McCulloch -Pitts  neuron;  but  it  should  be  remembered  that 
the  biological  neuron  is  considerably  more  complicated,  and  may  incorporate 
within  itself  functions  which  we  require  whole  networks  of  simplified  neurons 
to  realize . 

3.1.2  Topological  Organization  of  the  Network 


The  human  brain  consists  of  some  10  neurons  of  all  types. 
These  are  arranged  in  a  network  which  receives  inputs  from  receptor 
neurons  at  one  end,  and  conveys  signals  to  the  effector  neurons  at  the 
output  end.  Different  sensory  modalities  --  vision,  hearing,  touch, etc.  -- 
communicate  with  the  centra!  nervous  system  by  way  of  distinct  nerve 
bundles,  which  enter  it  at  different  points.  Each  of  these  modalities, 
after  passing  its  information  through  a  network  of  cells  which  respond 
more  or  less  exclusively  to  stimuli  from  that  modality,  eventually  contri- 
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butes  to  a  common  pool  of  activity  in  the  "association  areas"  of  the  central 
nervous  system  (CNS).  Output  signals  originate  either  from  the  parts  of 
the  CNS  which  are  specific  to  a  particular  modality  (for  example,  the 
pupillary  reflex  mechanism)  or  from  the  common  activity  areas  (as  in' 
speech).  Final  outputs  may  go  through  a  series  of  stages  in  which  motor 
patterns  or  sequences  are  selected,  and  detailed  coordination  is  regulated. 
From  these  motor  control  regions,  feedback  paths  re-enter  the  association 
areas  and  sensory  integration  areas,  so  that  the  possibility  of  an  elaborate 
servo-mechanism  for  the  control  of  motor  activity  exists. 

'While  this  general  picture  holds  true  for  most  biological 
organisms,  there  is  considerable  variation  both  in  gross  and  detailed 
anatomy,  from  species  to  species  and  individual  to  individual.  In  under¬ 
taking  to  design  a  first  order  approximation  to  this  structure  for  use  in  a 
brain  model,  we  will  begin  with  a  network  consisting  of  a  single  array  of 
sensory  units,  a  layer  of  association  units,  and  a  single  effector,  or 
response  unit.  In  later  models,  more  complicated  structures  will  be 
considered.  Even  the  simplest  models,  however,  are  capable  of  showing 
a  surprising  similitude  to  the  functional  properties  of  the  brain.  It  seems 
reasonable,  therefore,  to  regard  the  complications  of  neuroanatomy  in  the 
various  species  as  elaborations  of  a  basically  simple  schema,  which  is  to 
be  found  throughout.  This  basic  plan  of  organization  is  illustrated  in 
F  igur  e  1  . 
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BASIC  TOPOLOGICAL  STRUCTURE  OF  THE  NERVOUS  SYSTEM  AND  ITS  SOURCES  OF  INFORMATION 


The  distribution  of  cell  types  and  connection  patterns  has  been 
studied  by  Lorente  de  No,  Sholl,  Bok,  and  others  (Refs.  8,  52,  93).  A 
t^-pical  cell  in  the  cerebral  cortex  receives  input  connections  from  some 
hundreds  of  other  cells,  which  may  be  located  in  widely  scattered  regions, 
but  its  output  is  more  likely  to  be  transmitted  to  a  relatively  localized 
region.  Cells  which  receive  sensory  input  signals  are  likely  to  have  a 
restricted  field  of  origins  in  a  sensory  surface,  such  as  the  retina  or 
the  skin. 


The  mapping  of  the  frog  retina  into  the  brain  has  been  studied 
by  Lettvin  (Ref.  51)  who  finds  a  rather  precise  topographic  mapping,  in 
which  several  different  types  of  information  arc  represented  in  different 
layers.  This  topographic  mapping  is  established  genetically  despite 
the  fact  that  the  fibers  which  transmit  the  information  from  the  retina 
are  apparently  completely  ’'scrambled''  in  the  optic  nerve.  Moreover, 
experiments  by  Sperry  (Ref.  94)  and  more  recently  by  Lettvin  (Ref.  51) 
show  that  if  the  optic  nerve  is  severed  and  allowed  to  grow  together  again, 
the  fibers  which  originally  transmitted  to  a  particular  terminal  location  will 
tend  to  reconnect  to  that  same  terminal  location,  with  surprisingly  little 
loss  of  precision.  This  points  to  a  highly  specific  neural  organizing 
capability,  which  must  be  taken  into  account  in  considering  admissible 
types  of  constraints  for  a  brain  model.  In  the  mammalian  brain,  each 
sensory  modality  appears  to  be  represented  by  an  orderly  topographic 
mapping  analogous  to  that  just  described.  Auditory  stimuli,  for  example, 
are  mapped  into  a  region  which  is  organized  according  to  pitch;  tactile 
stimuli  are  mapped  according  to  body  location,  and  so  forth.  Similarly, 

See  also  Section  3.1.4. 
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the  motor  neurons  are  organized,  in  the  cerebral  cortex,  in  an  ordered 
arrangement  which  is  topologically  similar  to  the  organization  of  the 
muscles  which  are  controlled. 

In  contrast  to  the  highly  specific  regional  organization  in  the 
gross  anatomy  of  the  sensory  projection  areas  of  the  cortex,  the  detailed 
microstructure  of  the  network  appears  to  be  essentially  random,  governed 
only  by  directional  gradients  and  preferences,  and  statistical  distributions 
of  fiber  lengths  for  various  types  of  cells  (see  Sholl,  Ref.  93).  In  the 
human  nervous  system,  it  appears  that  the  most  specific  and  constrained 
topological  organizations  are  to  be  found  in  the  sensory  and  motor  systems, 
while  the  intervening  association  network  of  the  CNS  is  less  tightly 
controlled  in  its  organization,  presumably  depending  more  on  learning 
and  adaptive  modification  to  establish  the  required  pathways  and  linkages. 
The  degree  of  precision  in  establishing  the  topological  organization  of 
neurons  in  even  the  most  highly  constrained  reflex  mechanisms  is  probably 
far  less  than  that  in  most  artificial  data  processing  devices,  and  must  retain 
a  certain  degree  of  randomness  wherever  the  number  and  density  of 
connections  is  appreciable.  Unfortunately,  no  data  are  available  which 
would  indicate  the  complexity  of  topological  constraints  which  correspond 
to  the  highly  complex  inherited  behavior  patterns  which  are  known  to 
exist  in  many  species.  Since  the  nature  of  such  constraints  is  unknown, 
we  shall  avoid  gratuitous  assumptions  about  them,  as  far  as  possible. 

In  the  development  of  brain  models,  it  will  be  our  general  strategy  to  start 
out  with  minimally  constrained  networks,  and  examine  the  consequences  of 
introducing  particular  types  of  constraints,  one  at  a  time. 


3,1.3  Localization  of  Function 


Ever  since  the  brain  was  first  credited  with  the  control  of 
psychological  activity,  attempts  have  been  made  to  delineate  separate 
functions  for  its  different  parts.  In  the  last  century  (largely  under  the 
influence  of  Gall)  this  took  the  form  of  an  assignment  of  "mental. faculties” 
such  as  intelligence,  combativeness,  amativeness,  and  religiosity,  to 
special  regions  of  the  brain.  As  techniques  for  the  study  of  functional 
anatomy  improved,  this  gave  way  to  a  concept  of  organization  into  sensory 
tracts,  motor  tracts,  and  association  tracts.  The  functional  organization 
which  was  revealed  has  been  most  firmly  established  in  the  case  of  sensory 
and  motor  tracts,  where  a  particular  position  in  the  brain  is  correlated  with 
a  particular  sensory  locus,  or  a  particular  set  of  muscles  whose  activity  it 
controls.  An  excellent  review  of  sensory  and  motor  mapping  can  be  found 
in  Ruch  (Refs,  88,  89).  More  recently,  a  finer  breakdown  in  the  localization 
of  sensory  functions  has  been  demonstrated  by  Lettvin  and  associates  (Ref.  51). 
Four  distinct  types  of  information,  involving  distinct  aspects  of  the  visual 
stimulus  (contrast,  curvature,  movement,  and  dimming  of  illumination)  have 
been  shown  to  be  mapped  into  four  distinct  layers  of  the  tectum  of  the  frog. 

This  suggests  localization  of  analytic  functions,  of  a  sort  which  has  been 
suspected  but  not  previously  demonstrated. 

In  dealing  with  the  so-called  "association  areas"  of  the  cerebral 
cortex,  and  with  other  parts  of  the  brain  which  are  not  clearly  related  to 
sensory  data  processing  or  motor  coordination,  something  of  the  old 
treatment  in  terms  of  "mental  faculties"  still  remains;  specifically, 
centers  have  been  found  which  are  commonly  attributed  with  primary 
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responsibility  for  temporary  and  permanent  memory,  for  emotional  behavior, 
for  speech  recognition  and  speech  production,  and  (in  the  frontal  lobes)  for 
the  integration  of  complex  goal -directed  activities.  The  lack  of  clear  opera¬ 
tional  tests  for  such  capabilities  has  been  a  hindrance  to  progress  in  such 
functional  mapping,  and  the  results  are  considerably  more  ambiguous  than 
is  the  case  with  sensory  and  motor  functions.  A  discussion  of  current 
evidence  on  brain  localization  with  respect  to  these  "higher  faculties"  is 
found  in  Pribram  (Ref.  72).  Much  of  the  recent  work  is  concerned  with  the 
localization  of  tracts  which  influence  motivation,  alertness,  and  conscious¬ 
ness  in  the  organism  (Refs.  I,  22,  38,  64,  65). 

One  feature  which  is  of  particular  importance  for  brain  models 
is  the  apparent  plasticity  of  localization  in  the  "association  areas"  (or 
"intrinsic  systems",  to  use  the  terminology  advocated  by  Primbram)  in 
contrast  to  the  relatively  fi.xed  and  irreplaceable  character  of  the  sensory 
and  motor  tracts.  Loss  of  function,  due  to  destruction  of  association  cortex, 
is  apt  to  be  transient,  with  adjacent  areas  taking  over  the  function  after  a 
period  of  readaptation.  Jackson,  in  his  classic  studies  of  the  motor  cortex, 
(Ref.  36)  observed  that  even  here  localization  is  not  rigid  and  absolute,  and 
that  a  certain  amount  of  flexibility  exists,  permitting  the  functions  of  damaged 
tissue  to  be  taken  over  by  neighboring  areas.  The  sensory  projection  areas, 
on  the  other  hand,  appear  to  be  indispensible  to  perception;  destruction  of 
the  optical  cortex  leads  to  permanent  blindness  in  an  area  corresponding  to 
the  location  of  the  lesion,  and  similar  phenomena  are  to  be  found  in  other 
sensory  modalities.  Thus,  the  e.xtreme  hypothesis  of  equipotentiality 
advocated  originally  by  Lashley  (Ref.  49).  (who  observed  that  cortical 
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ablation  appeared  to  produce  a  general  deficit  in  performance  proportional 
to  the  amount  of  cortex  extirpated,  rather  than  eliminating  specific  memories 
and  abilities)  has  been  modified  in  the  direction  of  relative  localization, 
which  is  quite  strict  for  certain  sensory  functions,  and  comparatively  weak 
and  readily  modified  for  more  complicated  control  functions,  thinking,  and 
memory . 


A  rather  different  approach  to  localization  is  suggested  by  the 
histological  studies  of  cortical  tissue,  initiated  originally  by  Brodmann,  and 
pursued  more  recently  by  Lorente  de  No  and  Sholl  (Refs.  52,  93).  The 
"  cytoarchitectonic  areas”  which  have  been  described  in  these  studies  differ 
in  their  microstructure  and  detailed  organization,  and  attempts  have  been  made 
to  relate  such  differences  to  the  function  of  the  cortex  in  which  they  occur. 

To  date,  this  approach  has  not  led  to  particularly  significant  results,  although 
in  principle  it  may  ultimately  suggest  the  essential  organizational  properties 
which  must  be  incorporated  into  a  brain  model. 

At  the  primitive  level  of  organization  to  which  our  models  will 
aspire  at  this  time,  current  data  on  brain  localization  are  of  only  secondary 
interest.  The  main  features  of  the  brain  still  seem  to  be  adequately 
described  by  the  general  topological  structure  shown  in  Fig.  1.  The 
"central  integration  and  control  network"  indicated  in  the  diagram  is  known 
to  possess  some  important  internal  demarcations  in  higher  organisms,  but 
the  precise  functions  of  these  parts  and  their  interrelations  is  still  largely 
speculative.  In  simpler  brains  (crustacea,  for  example)  the  gross 
organization  is  probably  no  more  complex  than  indicated  by  the  diagram; 
and  it  seems  likely  that  in  general  it  is  the  fine  structure,  rather  than  the 
gross  anatomy,  which  determines  the  functional  properties  of  the  network. 
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3.1.4  Innate  Computational  Functions 


There  is  no  doubt  that  mechanisms  of  considerable  complexity, 
sufficient  for  perceptual  tasks  and  the  control  of  organized  behavior,  can 
be  created  by  genetic  control  of  growth  and  maturation.  This  is  most 
dramatically  evident  in  the  instinctual  patterns  of  insects  (for  example, 
the  well  known  communication  system  of  bees,  and  the  frequently  cited 
behavior  patterns  of  carpenter  wasps),  but  is  also  clearly  present  in 
vertebrates  (e.g.,  the  spawning  behavior  of  salmon,  and  the  migratory 
behavior  of  birds,  as  described  in  Ref,  90).  Recently,  Gibson  and  Walk 
have  furnished  clear  experimental  evidence  for  the  innate  perception  of 
depth  in  mammals  (Ref.  24).  All  of  these  phenomena  require  "built-in" 
control  mechanisms,  of  a  rather  intricate  sortv  In  the  cases  just  cited, 
these  built-in.mechanisms  are  not  known  in  any  detail.  A  number  of  more 
elementary  functions  have  been  discovered,  however,  which  provide  some 
picture,  of  the  types  of  "corqputational  mechanisms"  which  are  likely  to 
exist  throughout  the  central  nervous  sytem. 

The  stimulus  analyzing  mechanisms  discovered  by  Lettvin  and 
associates  for  frog  vision  have  already  been  mentioned.  In  these  studies,  it 
is  found  that  certain  ganglion  cells  in  the  frog  retina  respond  only  to  contours 
or  strong  contrast  gradients  within  their  sensory  field;  others  respond  only  to 
convex  images;  others  to  moving  boundaries;  and  still  others  to  a  general 
dimming  of  illumination  over  their  entire  field.  Each  of  these  four  cell  types 
transmits  its  information  to  a  distinct  layer  of  the  frog's  tectum,  where  its 
position  is  mapped  topographically.  Thus,  one  layer  represents  a  contour 

C 

Other  visual  analyzing  mechanisms  have  recently  been  demonstrated  by 
Hubei  and  Wiesel  (Ref.  113)  in  the  cat's  cortex  (see  Chapter  23). 
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map,  or  outline  drawing  of  the  stimulus  field,  another  represents  a  location 
map  for  small  convex  objects  or  corners,  a  third  represents  movement 
vectors,  and  a  fourth  indicates  regions  of  dimming  illumination. 

At  the  motor -control  end  of  the  nervous  system,  a  number  of 
reflex  arcs  and  servo-control  systems  have  been  analyzed.  The  pupillary 
reflex,  for  example,  has  been  analyzed  as  a  typical  servomechanism  by 
Stark  and  Baker  (Ref.  96).  A  considerable  amount  of  work  has  also  been 
done  on  the  cerebellar  servomechanisms  which  regulate  muscular  action 
under  the  control  of  cortical  decisions  and  kinesthetic  feedback  information 
(c.f.  Ruch,  Ref.  89).  It  is  probably  safe  to  assume  that  similar  closed-loop 
control  systems,  employing  familiar  servomechanism  principles,  are 
employed  throughout  the  central  nervous  system  for  such  purposes  as 
controlling  level  of  activity,  preventing  runaway  excitation  phenomena 
(such  as  occur  in  epileptic  seizures),  and  regulating  sensitivity  to  selected 
aspects  of  the  sensory  input  data. 

It  is  worth  noting  that  most  of  the  specific  computing  mechanisms 
used  in  muscular  control  appear  to  be  of  an  analog  variety,  rather  than  digital; 
they  make  use  of  intensities  and  frequencies  of  activity  for  the  direct  control 
of  servo-systems,  rather  than  computing  a  control  formula  from  encoded 
data  and  then  generating  the  control  signal  required.  The  stimulus  analyzing 
mechanisms  found  by  Lettvin,  however,  constitute  a  sort  of  digital  code,  in 
which  stimulus  properties  are  represented  by  presence  or  absence  of  signals 
from  particular  neurons.  It  seems  likely,  as  von  Neumann  has  observed 
(Ref.  105)  that  the  brain  makes  extensive  use  of  both  digital  and  analog 
principles  in  its  operation,  and  it  appears  that  both  types  of  devices  may 
be  genetically  determined. 
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An  interesting  example  of  theoretical  speculations  on  possible 
computational  functions  employed  in  shape  discrimination  in  the  octopus  can 
be  found  in  Sutherland  (Ref.  98).  Sutherland  reviews  several  alternative 
theories,  and  presents  evidence  in  support  of  his  own  conjecture  that  the 
octopus  responds  to  an  analysis  of  the  horizontal  and  vertical  dimensions 
of  the  stimulus  measured  along  all  possible  cross-sections.  No  attempt  is 
made,  however,  to  tie  the  computational  process  to  a  particular  neurological 
structure,  or  to  indicate  a  mechanism  which  might  carry  out  the  indicated 
operations  . 

3.1.5.  Phenomena  of  Learning  and  Forgetting 


Tims  far,  we  have  concentrated  on  the  anatomical  and  physio - 
logicai  features  of  the  nervous  system  which  appear  to  be  basic  for  the 
design  of  a  brain  model.  We  now  turn  to  some  of  the  behavioristic  and 
psychological  functions  which  a  brain  model  should  be  able  to  demonstrate. 

Phenomena  of  retention  and  adaptation  in  organisms  have  been 
studied  in  a  variety  of  experiments,  varying  greatly  in  their  design.  In 
traditional  usage,  "memory”  experiments  have  been  concerned  more  with 
the  retention  and  recall  of  experience,  while  "learning"  experiments  are 
concerned  with  the  acquisition  and  modification  of  behavior.  Both  types  of 
investigation,  however,  are  concerned  with  lasting  modifications  in  the  state 
of  the  organism,  and  in  complicated  problems  (e.g.,  those  involving 
"insight")  one  tends  to  merge  into  the  other;  accordingly,  all  of  these 
experiments  will  be  considered  together  in  this  discussion. 
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Quantitative  studies  of  learning  and  memory  in  psychology 
stem  from  the  classical  experiments  of  Ebbinghaus,  in  1885,  on  the  learning 
and  retention  of  nonsense  syllables.  Using  himself  as  a  subject,  he  obtained 
learning  and  forgetting  curves,  and  demonstrated  many  of  the  phenomena  of 
recognition  and  retention  which  have  interested  psychologists  ever  since. 
Related  phenomena  have  been  studied  by  Bartlett  (Ref.  5  )  using  more  highly 
organized  material.  A  second  type  of  experiment,  the  conditioned  reflex 
experiment,  first  employed  by  Pavlov,  is  characterized  by  the  association 
of  an  existing  response  to  a  new  stimulus,  which  did  not  evoke  the  response 
prior  to  the  conditioning  procedure.  A  third  type  of  experiment,  employed 
originally  by  Thorndike  and  recently  studied  extensively  by  Skinner  and 
others,  is  concerned  with  the  learning  of  a  pattern  of  behavior  which  is 
instrumental  to  the  solution  of  a  problem,  or  which  satisfies  a  drive. 

Where  such  problem-solving  behavior  appears  to  depend  in  a  crucial  way 
upon  a  "cognitive  restructuring"  of  the  situation,  or  the  formation  of  a  new 
"concept",  we  have  an  experiment  in  "insight"  or  "concept  formation",  as 
in  the  studies  of  the  Gestalt  psychologists. 

It  is  possible  that  these  three  types  of  experiments  are  actually 
demonstrating  fundamentally  different  mechanisms  of  learning.  The  first 
deals  with  recognition  and  recall  of  previous  perceptual  experience;  the 
second  i.s  concerned  with  the  generalization  of  responses  from  initial 
stimuli  to  new  stimuli  by  virtue  of  temporal  association;  the  third  is 
concerned  with  the  discovery  and  establishment  of  problem-solving  behavior. 
Still  other  experiments  deal  with  such  phenomena  as  short-term  memory 
span,  acquisition  of  needs  and  motives,  attitude  formation,  perfection  c/  a 
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motor  skill,  or  learning  to  make  fine  perceptual  judgements.  Undoubtedly, 
the  same  physiological  processes  are  tapped  in  many  of  these  tasks;  on  the 
other  hand,  attempts  at  subsuming  all  of  them  under  a  set  of  general  "laws 
of  learning"  does  not  seem  to  be  particularly  helpful  for  our  present  purpose. 
From  the  standpoint  of  brain  model  construction,  it  seems  safest  to  regard 
each  type  of  learning  experiment  as  a  distinct  problem,  with  its  own  variables 
and  rules  of  behavior  which  we  hope  that  our  model  will  duplicate  under 
equivalent  experimental  conditions.  The  main  value  of  such  psychological 
experimentation,  then,  is  to  provide  us  with  a  set  of  "calibration  experiments", 
by  means  of  which  a  model  can  be  compared  with  known  organisms  under  well 
defined  conditions.  The  reader  who  is  unfamiliar  with  the  literature  of 
learning  experimentation  will  find  the  reviews  by  Hilgard,  Brogden,  and 
Hovland  (m  Ref.  112  )  particularly  helpful. 

In  a  number  of  experiments,  attempts  have  been  made  to  find 
the  actual  physiological  correlates  of  the  learning  or  memory  phenomenon. 
Notable  among  these  are  the  experiments  of  Penfield  (Ref.  68),  who  finds 
that  electrical  stimulation  of  selected  points  on  the  cortex  may  evoke  long 
and  vivid  sequences  of  past  experience,  apparently  with  hallucinatory  clarity. 
John  (Ref.  39)  has  recently  reviewed  experiments  in  cortical  conditioning,  and 
reported  a  number  of  interesting  results  of  his  own,  which  suggest  that 
memory  may  involve  modification  of  the  connections  between  the  deep  centers 
of  the  brain  stem  and  the  cerebral  cortex,  with  the  reticular  formation  playing 
a  particularly  significant  role.  The  experiments  of  Olds  (Refs.  64,  65,  66) 
on  the  reinforcing  effects  of  electrical  stimulation  applied  to  certain  points 
in  the  hypothalamus  and  adjacent  structures  suggest  that  these  may  be 
involved  in  the  motivational  aspect  of  learning.  Such  experiments,  which 
have  only  recently  become  possible  through  the  improvement  of  electro- 
physiological  techniques,  are  likely  to  become  increasingly  valuable  as 
guides  to  theory  construction. 
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3,1.6  Field  Phenomena  in  Perception 


Early  studies  of  perception  were  largely  concerned  with  the 
absolute  question  of  what  perceptions  are  made  of;  such  studies  were 
concerned  with  range  and  sensitivity  of  sensory  abilities,  measurement  of 
limits  and  thresholds,  and  the  detailed  dissection  of  sensory  stimuli  into 
fundamental  components.  Such  studies  form  the  main  subject  matter  of 
classicial  psychophysics.  In  psychology,  they  gave  rise  to  an  atomistic 
approach  (reaching  its  utlimate  expression  in  the  work  of  Titchener)  in 
which  it  was  proposed  that  any  phenomenon  of  perception  could  be  accounted 
for  by  a  proper  compounding  of  sensory  elements,  each  of  which  retains  its 
own  identity,  like  a  piece  of  tile  in  a  mosaic.  During  the  last  few  decades, 
largely  under  the  influence  of  the  Gestalt  psychologists,  studies  of  perception 
have  turned  from  the  question  of  the  constituents  of  perception  to  the  question 
of  the  conditions  under  v.'hich  a  given  perception  occurs.  It  is  now  generally 
accepted  that  what  is  perceived  depends  not  only  upon  the  properties  of  the 
stimulus  object,  or  image,  which  is  recognized,  but  upon  the  organization 
of  the  entire  sensory  field  in  which  it  is  embedded.  This  is  true  not  only 
in  vision,  but  in  other  sensory  modalities  as  well. 

The  field  phenomena  which  have  been  studied  include  the  effects 
of  contrast,  figure -ground  organization,  frames  of  reference,  depth  perception, 
size  constancy,  and  illusions.  The  reader  is  referred  to  Kcffka  (Ref.  44  ) 
and  Gibson  (Ref.  26  )  for  detailed  discussion  of  these  topics.  For  present 
purposes,  the  most  iinportant  implication  of  this  work  is  that  a  physical 
model  for  a  perceiving  system  must  permit  the  interaction  of  all  elemc  it.- 
in  a  spatially  organized  field.  It  is  not  sufficient  simply  to  detect  sets  of 
elements  which  renresent  a  "pattern";  the  perception  of  a  pattern,  and  the 
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interpretation  of  it,  depends  in  a  fundamental  way  on  metric  relationships 
to  other  sense  data  from  the  same  modality,  and  correlations  with  sensory 
data  from  entirely  different  modalities.  The  perception  of  a  line  as  "upright", 
for  example,  depends  on  its  observed  angles  relative  to  visual  standards  of 
"uprightness",  such  as  the  corners  of  a  room,  and  also  upon  the  gravity 
senses  and  kinesthetic  data  which  provide  a  frame  of  reference  for  "up" 
and  "down".  The  decision  that  two  disjoint  patches  of  illumination  represent 
parts  of  the  same  object  rather  than  different  objects  depends  upon  their 
contrast  or  resemblance  to  the  field  structure  around  them,  as  well  as  on 
their  relationship  to  one  another.  It  is  possible  (as  Gibson  has  suggested) 
that  recognition  is  never  achieved,  in  biological  systems,  by  the  representation 
of  a  particular  receptor  configuration,  but  only  by  the  representation  of  sets 
of  relations  (angles,  ratios,  etc.)  as  its  elementary  data.  If  this  is  the 
case,  a  suitable  set'of  analyzing  mechanisms,  capable  of  measuring  such 
variables  must  be  included  in  the  pre -recognition  tracts  of  a  brain  model. 

As  our  models  gain  in  sophistication,  it  is,  in  fact,  becoming  increasingly 
apparent  that  such  analyzing  mechanisms  are  essential  for  purposes  of 
efficiency  and  economy  of  design. 

The  perceptrons  to  be  considered  initially  will  not  possess 
intrinsic  field -organization  properties.  With  the  introduction  of  cross - 
coupled  systems,  such  properties  begin  to  emerge.  An  evaluation  of 
these  systems  by  means  of  typical  "Gestalt  perception  experiments"  has 
barely  begun  at  the  present  time,  but  represents  one  of  the  most  important 
tasks  to  be  undertaken. 
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3.1.7  Choice -Mechanisms  in  Perception  and  Behavior 


Selective  attention  and  "set"  are  fundamental  phenomena  in 
the  control  of  psychological  activity.  They  indicate  mechanisms  for 
choosing  between  alternative  courses  of  action,  or  points  of  view,  and 
play  a  logical  role  analogous  to  the  selection  of  different  branches  in  a 
"flow  diagram"  of  a  digital  computing  routine.  Attention  and  psychological 
set  are  largely  determined  by  the  situational  context  in  which  behavior 
occurs,  and  by  the  current  "goals"  or  "purposes"  of  the  organism,  which 
may  be  thought  of  as  choices  of  a  superordinate  sort,  under  which  sub¬ 
decisions  are  made  to  select  particular  modes  of  activity.  For  example, 
an  individual  who  is  set  to  look  for  a  word  in  a  dictionary  will  be  most 
attentive  to  the  sequence  of  letters  in  boldfaced  type,  while  someone  who 
is  looking  for  torn  pages  will  probably  be  unaware  of  the  particular  letter 
combinations,  and  someone  who  is  simply  scanning  the  volume  to  look  for 
pictures  is  apt  to  notice  neither  the  spelling  nor  the  condition  of  the 
pages . 

The  importance  of  set,  or  attitude,  for  learning  has  been 
emphasized  by  Hebb  (Ref.  33),  but  choice  mechanisms  of  this  type  have 
rarely  been  incorporated  in  the  detailed  design  of  theoretical  brain 
models.  In  purely  logical  models  of  behavior,  they  play  a  considerably 
more  prominent  role  --  for  example,  in  Tolman's  learning  theory,  and 
in  Newell  and  Simon's  models  for  problem  solving  behavior  (Refs.  62,  63), 
selective  choice -mechanisms  are  specifically  designated.  In  a  brain 
model,  it  is  clear  that  such  phenomena  must  be  closely  related  to  the 
problem  of  "temporary  memory",  since  the  set  under  which  the  brain 
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is  currently  operating  must  be  represented  by  a  temporarily  stable,  but 
nonetheless  readily  altered,  state  of  the  system,  capable  of  modifying 
processes  v/hich  go  on  while  it  persists.  It  seems  likely  (although  un¬ 
supported  by  any  direct  evidence)  that  pools  of  neurons  connected  by 
reverberating  circuits  may  be  important  set-maintaining  devices  in  the 
nervous  system,  exerting  their  influence  on  the  brain  as  a  whole  by 
means  of  a  widely  distributed  barrage  of  sub-threshold  excitation  or 
inhibition.  The  plausibility  of  such  mechanisms  will  be  considered  in 
more  detail  in  a  later  chapter. 

3.1.8  Complex  Behavioral  Sequences 

The  discussion  of  psychological  sets  and  choice  mechanisms 
brings  us  to  a  consideration  of  even  more  highly  organized  behavior  and 
thought  patterns,  such  as  the  steps  taken  in  performing  an  arithmetic 
computation,  or  driving  to  work,  or  performing  a  piece  of  research. 

All  of  these  activities  represent  orderly  sequences  of  decisions  and  action, 
and  can  be  considered,  as  Newell  and  Simon  have  suggested,  as  programs 
to  be  performed.  In  some  cases,  these  programs  are  highly  stereotyped, 
and  determined  by  rigid  rules;  in  other  cases,  they  employ  chance 
mechanisms  and  heuristic  procedures.  Much  of  the  classical  psychological 
literature  on  problem  solving  and  insight  is  relevant  to  this  second  class 
of  programs,  while  a  rat  running  a  maze  might  be  considered  an  example 
of  the  first  type.  As  in  the  case  of  selective  attention  and  set,  these 
problems  have  not  been  dealt  with  in  detail  by  any  brain  models  proposed 
to  date,  but  it  seems  likely  that  at  this  level  the  brain  and  the  computer 
begin  to  approach  a  common  meeting  ground.  Problems  of  memory  span. 
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storage,  and  sequence  control  are  present  in  both  types  of  systems,  and 
many  of  the  logical  problems  confronted  in  "heuristic  programming" 
(Refs.  60,  62,  63  )  seem  to  be  direct  translations  from  human  problem¬ 
solving  experience  to  the  language  of  computing  machines.  This  does 
not  mean  that  the  physical  structure  of  a  brain  model  must  ultimately 
resemble  that  of  digital  devices,  but  rather  that  the  same  basic  logical 
organization  --  a  memory  for  programs,  a  memory  for  data,  and  a 
mechanism  for  the  sequential  performance  of  a  given  program  --  must  be 
available.  The  "programs"  themselves  presumably  take  the  form  of 
sequences  of  selective  sets,  or  bias  states,  arranged  in  a  heirarchical 
manner,  so  that  sub-operations  are  performed  under  the  control  of  a 
"master  set"  or  "master  program"  which  determines  the  overall  plan  of 
activity.  While  the  detailed  properties  of  such  systems  must  necessarily 
remain  speculative  at  the  present  time,  we  shall  see  that  such  a  concept 
is  compatible  with  the  organization  of  perceptrons  not  too  far  removed 
in  complexity  from  those  which  we  are  now  capable  of  analyzing. 

3 . 2  Current  Is  sues 

While  the  discussion  of  the  preceding  section  has  attempted 
to  stick  to  a  relatively  conservative  and  uncontrover sial  rendition  of 
physiology  and  psychology  as  it  applies  to  the  brain  model  problem,  it 
is  clear  that  in  the  last  pages  we  have  been  drawn  into  increasingly 
speculative  and  uncertain  areas  of  discourse.  In  this'sec tion ,  an 
attempt  will  be  made  to  highlight  a  number  of  issues  which  seem  most 
salient  in  determining  the  fate  of  various  brain  models,  and  which  are 
not  answerable  at  the  present  time  outside  the  realm  of  speculation. 


Of  necessity,  a  physical  model  will  have  to  take  a  stand  on  most  of  these 
issues,  and  it  is  possible  that  by  investigating  the  logical  consequences  of 
such  a  stand,  a  decision  as  to  the  plausibility  of  various  alternatives  might 
be  made;  the  brain  model  approach  lias  a  chance,  here,  of  providing  answers 
which  empirical  studies  have  so  far  been  unable  to  discover.  In  any  event, 
the  decision  taken  on  these  issues  represent  the  points  at  which  a  brain 
model  is  most  vulnerable  to  future  attack,  as  new  evidence  is  uncovered. 

3.2.1  Elementary  Memory  Mechanisms: 

The  status  of  current  information  on  basic  memory  mechanisms 
in  the  nervous  system  has  been  reviewed  recently  by  Burns  (Ref.  13).  Most 
brain  models  employ  some  memory  hypothesis,  but  evidence  as  to  the  nature 
of  actual  physiological  mechanisms  which  might  be  involved  is  almost 
totally  lacking.  It  is  generally  agreed,  simply  on  the  basis  of  definition, 
that  whatever  we  call  "memory”  involves  a  modification  of  neural  activity 
in  the  central  nervous  system  or  its  output  signals,  as  a  function  of 
exposure  to  previous  events  or  "experience".  In  some  models,  this 
modification  has  been  attributed  to  persistent  activity  in  closed  loops  of 
neurons,  but  must  theorists  are  now  agreed  that,  while  such  a  memory 
mechanism  might  account  for  "short  term  memory",  and  might  play  a 
significant  role  in  the  establishment  of  more  permanent  memory  traces, 
there  must  also  exist  a  non-volatile  memory  mechanism  (e.g.,  a 
structural  or  chemical  change)  which  can  outlast  periods  of  neural  in¬ 
activity,  and  is  relatively  insensitive  to  transient  activity  in  the  nervous 
system  (see  Hebb,  Ref.  33,  pp .  12-16).  The  nature  of  this  memory  trace 
mechanism,  it  is  generally  agreed,  must  be  such  as  to  facilitate  the  use 


or  selection  of  neural  pathways  which  have  been  active  at  the  time  of  the 
’'remembered''  experience  or  behavior,  and  virtually  all  specific  models 
assume  that  it  takes  the  form  of  a  facilitation  of  connections  between  sources 
of  excitation  and  responding  neurons  in  the  motor  system  or  CNS .  In 
making  such  an  assumption,  the  influence  of  the  conditioned  reflex  model, 
which  suggests  that  sensory  neurons  become  coupled  to  association  neurons, 
by  which  they  are  connected  to  motor  neurons,  is  clearly  evident.  An 
alternative  position,  in  which  the  preferred  pathways  "win  out"  by  surviving 
deteriorative  changes  in  unused  pathways,  rather  than  by  active  facilitation, 
has  not  been  explored  to  any  significant  degree,  but  appears  to  be  logically 
similar  to  its  potentialities. 

Granting  that  the  memory  mechanism  takes  the  form  of  some 
means  of  selecting  particular  patterns  of  activity  in  preference  to  others, 
depending  upon  the  input  or  current  state  of  the  nervous  system,  particular 
physiological  models  include;  (1)  mechanisms  for  reconstituting  past  activity 
states  of  the  entire  CNS  or  a  major  portion  of  it;  (Z)  mechanisms  for  selecting 
particular  output  channels  as  a  function  of  current  activity  or  sensory  inputs. 
The  specific  mechanisms  proposed  generally  fall  into  one  of  the  following 
four  categories: 

(1)  Extracellular  influences  and  modification  of  the  neural  medium: 
This  has  been  proposed  by  Kohler  (Ref.  45),  Bok  (Ref.  8),  and  others,  who 
assume  that,  if  a  "structural  trace"  is  present  at  all,  it  is  not  laid  down  in 
specific  neurons,  but  in  the  surrounding  medium,  where  it  is  capable  of 
modifying  activity  in  nearby  neural  tracts.  The  possible  form  that  such 
a  mechanism  might  take  has  never  been  specified  in  detail,  and  the  approach 
is  generally  discounted  by  current  theorists.  The  motivation  for  such  a 
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hypothesis  comes  in  part  from  attempts  at  preserving  the  isomorphism 
between  a  spatially  distributed  memory  trace  and  spatially  organized 
visual  events,  as  in  Kohler's  system.  While  it  is  not  implausible  to  assume 
that  the  surrounding  medium  participates  in  the  memory  trace  structure, 
it  seems  likely  that  such  interaction  between  medium  and  neurons  would 
be  highly  localized,  probably  influencing  only  a  single  neuron  or  synaptic 
junction,  rather  than  forming  a  widespread  organized  structure  independent 
of  the  neuruiis  themselves.  If  such  a  position  is  accepted,  then  whatever  is 
left  of  this  approach  can  be  subsumed  under  one  or  another  of  the  remaining 
neural  modification  mechanisms. 

(2)  Threshold  Modification;  The  hypothesis  that  the  threshold 
of  an  active  neuron  may  be  reduced  as  a  consequence  of  the  activity,  thus 
making  it  more  likely  that  this  cell  will  respond  to  future  stimuli,  has 
frequently  been  proposed  as  a  possible  memory  mechanism  (c.f.  ,  Taylor, 
Ref.  99  ).  If  we  take  the  "threshold",  in  its  conventional  sense,  to  mean 
the  degree  of  membrane  depolarization  or  the  level  of  input  excitation 
which  will  cause  the  neuron  to  discharge,  regardless  of  the  particular 
synapses  involved  in  the  transmission  of  e.xcitation,  then  this  model 
meets  two  main  objections:  first,  the  sensitivity  which  is  acquired  is  non¬ 
specific,  making  it  more  likely  that  the  cell  will  respond  to  any  input,  rather 
than  just  those  which  were  effective  at  the  time  that  the  memory  trace  was 
established;  second,  after  a  long  history  of  activity,  w'e  would  expect  the 
thresholds  of  all  neurons  to  be  reduced  to  a  minimum  level,  unless  some 
recovery  mechanism  exists.  If  such  a  recovery  mechanism  does  exist, 
memory  will  tend  to  be  lost  as  a  consequence,  and  it.  must  be  shown  that 
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the  rate  of  forgetting  would  not  vitiate  the  value  of  the  system.  Occasionally, 
the  concept  of  "threshold  reduction"  seems  to  be  used  in  the  sense  of  an 
increase  in  specific  sensitivity  of  a  neuron  to  a  particular  afferent  fiber. 

In  this  case,  the  threshold  reduction  mechanism  becomes  indistinguishable 
from  a  synaptic  facilitation  mechanism,  which  is  considered  below. 

(3)  Strengthening  of  active  neurons;  Eccles  (Ref.,  18),  Uttley 
(Ref.  102),  and  Rosenblatt  (Ref.  79)  have  proposed  models  in  which  the 
output  signals  of  a  frequently  active  neuron  gain  in  strength  or  effectiveness, 
affecting  all  terminals  alike.  This  model  retains  the  specificity  of  response 
of  a  neuron  (unlike  the  threshold  reduction  model)  but  increases  its  power 
to  activate  the  neurons  which  follow  it  in  series.  If  the  output  signal  from 
a  neuron  goes  to  a  single  destination  only,  this  is  equivalent  to  a  model  which 
strengthens  particular  synaptic  connections.  If  the  output  goes  to  a  number 
of  different  locations,  however,  there  is  a  lack  of  specificity  in  the  channel- 
selection  properties  of  this  mechanism,  which  must  generally  be  offset  by 
auxiliary  hypotheses.  In  Rosenblatt  (Ref,  79)  it  is  shown  that  by  means  of  a 
suitably  organized  feedback  mechanism,  a  particular  output  channel  can  be 
selected  through  a  statistical  bias.  The  feedback  guarantees  that  these  cells 
which  are  reinforced  all  have  at  least  one  "desirable"  output  connection,  the 
other  connections  being  distributed  at  random  among  a  large  number  of 
alternative  terminal  neurons,  each  of  which  consequently  receives  only  a 
fraction  of  the  total  reinforcement  applied.  While  such  a  model  is  shown 
to  be  logically  workable,  the  specific  feedback  connections  required  make 
it  physiologically  implausible,  and  it  remains  less  efficient  than  a  model 
in  which  specific  synapses,  rather  than  total  neurons,  are  selected  for 
modification . 
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(4)  Modification  of  selected  synapses:  This  model  has  been 
employed  by  Culbertson  (R.ef.  17),  Hebb  (Ref.  33),  and  others,  and  is 
employed  in  most  current  perceptron  models.  The  mechanism  takes  account 
of  the  correlation  of  activity  between  an  afferent  synapse  and  the  efferent 
neuron,  augmenting  the  strength  of  the  synaptic  ending  (or,  equivalently, 
the  sensitivity  of  the  sub-synaptic  membrane)  if  the  correlation  is  positive, 
and,  in  some  cases,  diminishing  it  if  the  correlation  is  negative.  The 
actual  physiological  process  by  which  such  a  correlation  might  occur  is 
obscure,  but  the  logical  advantages  of  such  a  mechanism  are  clear.  Hebb 
has  proposed  that  actual  synaptic  growth  might  occur,  improving  the  contact 
between  the  transmitting  and  receiving  neuron.  While  Eccles  has  considered 
possible  synaptic  growth  mechanisms  in  some  detail  (Ref.  18  )  there  is  little 
evidence  to  support  this  conjecture.  A  possible  biochemical  mechanism  has 
been  proposed  by  this  writer  (Ref.  83),  which  assumes  that  large  molecules 
used  as  catalysts  for  the  production  of  transmitter  substances  in  the  endbulb 
must  originate  from  the  nucleoplasm  of  the  post-synaptic  cell,  and  that  the 
exchange  of  these  molecules  is  facilitated  by  membrane  depolarization  and 
periods  of  activity  in  both  cells.  An  alternative  possibility,  in  which  the  mem¬ 
ory  m  e  c  h  a  n  i  s  m  is  entirely  contained  within  the  post-synaptic  cell,  is 
that  a  persistent  sensitization  of  the  subsynaptic’ membrane  in  the  neigh¬ 
borhood  of  an  active  synapse  occurs,  given  the  hyper'rnetabolic  state  which 
follows  activity.  The  facilitation  of  a  neuron's  response  to  repeated  sub¬ 
threshold  signals  which  has  been  reported  by  Bullock  (Ref.  11)  indicates 
that  a  localized  persistent  effect  of  the  sort  hypothecated  does  exist;  it 
remains  to  be  shown  that  the  subsequent  firing  of  the  neuron  may  serve 
to  "stamp  in",  or  fix  in  a  more  permanent  manner,  the  temporary  sensi¬ 
tivity  which  has  been  observed. 
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The  evaluation  of  a  particular  memory  hypothesis  must  depend, 
at  this  stage,  upon  its  logical  power  when  employed  in  specific  brain  models, 
as  well  as  its  physiological  plausibility.  The  mechanisms  which  are  consi¬ 
dered  in  this  report  have  been  selected  for  their  simplicity  and  their  demons 
trated  ability  to  yield  interesting  behavioral  results.  They  suggest  plausible 
directions  in  which  to  look  for  a  physiological  mechanism,  but  it  remains 
possible  that  the  actual  mechanisms  employed  by  the  brain  may  be  of  a  drasti 
cally  different  sort.  It  is  fundamental  to  this  approach,  that  any  lasting 
change  in  the  system,  whatever  its  physical  form,  may  act  functionally  as  a 
memory  trace.  It  seems  likely  that  there  is  not  a  single  memory  mechanism 
or  even  only  two  memory  mechanisms  at  work  in  the  brain,  but  rather  a 
great  number  of  dynamic  processes,  ranging  from  temporary  facilitation 
and  fatigue  effects  to  permanent  structural  changes,  all  of  which  contribute 
in  some  way  to  the  observed  psychological  phenomena  called  "memory". 
Among  these  processes,  it  is  likely  that  one  or  two  play  an  outstanding  role, 
but  likely  candidates  have  not  yet  been  found,  and  in  the  meantime,  it  seems 
wise  to  retain  an  open  mind  on  the  entire  question. 

3.Z.Z  Memory  Localization 


There  is  hardly  any  more  agreement  on  the  question  of  where 
memory  traces  are  to  be  found  (in  the  gross  anatomy  of  the  nervous 
system)  than  there  is  on  the  question  of  what  they  consist  of,  Lashley 
(Ref.  49)  was  largely  responsible  for  the  emphasis  on  "distributed  memory" 
among  many  theorists  over  the  last  few  decades,  and  Sperry  (Ref.  95)  has 
contributed  a  number  of  experiments  which  indicate  that  the  residual 
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effects  of  learning  must  be  widely  dispersed  throughout  the  brain.  On  the 
other  hand,  Penfield  (Ref.  68)  has  shown  that  specific  recall  may  be  evoked 
by  stimulation  of  specific  selected  points  in  the  cerebral  cortex.  E.  R.  John, 
in  a  model  which  is  supported  by  a  certain  amount  of  experimental  evidence 
(Ref.  39),  proposes  that  the  memory  traces  are  distributed  between  the 
thalamus  and  cortex,  involving  reverberating  circuits  and  feedback  loops 
between  these  two  regions  rather  than  being  localized  in  one  or  the  other  of 
them . 


The  question  of  localization  is  of  less  importance  for  a  functional 
model  of  the  brain  than  is  the  question  of  mechanism;  as  long  as  we  assume 
that  it  is  the  network  topology,  rather  than  the  actual  anatomical  position  of 
neurons,  which  is  important  in  determining  the  brain's  logical  properties, 
there  is  no  reason  for  requiring  that  a  brain  model  resemble  the  biological 
system  in  its  spatial  organization.  The  indirect  implications  of  the  different 
theories  of  localization  are  of  considerable  importance,  however.  For  one 
thing,  the  view  that  the  brain  contains  its  memories  in  a  widely  dispersed,  . 
intermingled  form,  suggests  a  mechanism  in  which  the  same  cells  parti¬ 
cipate  in  a  great  variety  of  different,  and  perhaps  totaly  unrelated,  memory 
organizations.  A  model  which  can  separate  distinct  memories  from  such  a 
multiply  overwritten  system  will  be  quite  different  in  character  from  one  in 
which  each  remembered  event  is  stored  in  its  own  distinct  location.  For 
another  thing,  the  apparent  complexity  of  memory-sites  which  may  interact 
in  the  recall  of  a  single  experience  or  association  (as  emphasized  in  John's 
work)  impresses  us  with  the  possibility  that  human  memory  may  be  a 
product  of  a  number  of  related  processes  and  mechanisms,  perhaps 
acting  in  a  complex  sequence  of  cause-and-effect,  rather  than  a  simple 
correlation  of  inputs  and  outputs. 
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Again,  we  are  stuck  with  the  necessity  of  simplifying  for 
lack  of  detailed  knowledge.  While  it  is  likely  that  memory  and  recall  in 
the  human  nervous  system  involves  the  coordinated  activity  of  several  parts 
of  a  complex  structure,  we  will  attempt,  at  the  outset,  to  see  what  psycho¬ 
logical  properties  can  be  duplicated  by  a  system  in  which  memory  is  located 
in  a  single  set  of  connections,  with  a  minimum  of  structural  differentiation. 
As  perceptrons  are  elaborated  into  more  highly  structured  models,  the 
question  of  which  connections  should  be  allowed  to  participate  in  memory 
processes  will  be  reconsidered,  and  alternative  systems  will  be  investigated. 

3.2.3  Isomorphism  and  the  Representation  of  Structured  Information 


Lashley,  Kohler,  Greene,  MacKay,  and  others  (Refs.  28,  45,  50, 
55,  56,  110)  have  dealt  with  various  aspects  of  the  problem  of  isomorphism 
between  the  representation  of  an  event  in  the  central  nervous  system  and  the 
physical  structure  of  the  event  in  the  outside  world.  In  the  naive  isomorphism 
of  Kohler,  it  is  required  that  the  representation  in  the  brain  should  actually 
have  a  spatial  structure  resembling  the  thing  that  it  represents;  in  the  more 
sophisticated  form  advocated  by  Greene,  it  is  sufficient  that  the  represen¬ 
tation  should  have  a  logical  structure  (not  necessarily  spatial  in  its  physical 
manifestation)  which  permits  it  to  be  broken  apart,  dissected,  and  reassembled 
by  suitable  manipulations  or  attention -directing  processes,  in  a  way  which  is 
related  to  the  parts,  surfaces,  or  aspects  of  the  real-world  phenomenon. 

While  some  such  structural  representation  seems  to  be  inescapable  in 
human  perception,  thinking,  and  imagery,  the  exact  form  that  this  might 
take  is  again  almost  totally  unknown.  This  is  essentially  the  problem  of 
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determining  the  code  employed  by  the  brain  in  its  representation  of 
perceptual  phenomena.  We  know  that  the  code  is  one  which  enables  us  to 
recognize  parts,  relations,  symmetries,  and  other  organizational  features 
which  might  be  lost  in  a  completely  arbitrary  representational  system  {such 
as  a  code  which  assigns  binary  symbols,  in  sequence,  to  all  stimuli,  and 
then  lists  all  of  those  which  are  to  be  considered  as  "similar").  We  also 
know  that  there  are  parts  of  the  brain  (the  sensory  projection  areas)  in 
which  actual  spatial  organization  of  stimulus  patterns  is  retained.  We  do 
not  know,  however,  how  far  the  representational  code  must  go  in  the 
direction  of  spatial  isomorphism  in  order  to  account  for  the  organizational 
properties  of  experience.  As  usual,  we  shall  begin  with  a  simplification 
which  assumes  an  unstructured  coding,  but  it  seems  likely  that  this  will  have 
to  be  abandoned  in  order  to  deal  with  problems  of  figural  representation, 
perception  of  relations,  and  other  "gestalt  problems".  An  attempt  will  be 
made  in  this  report,  however,  to  show  that  the  required  structuring  for 
some  of  these  problems  may  be  acquired  by  adaptive  processes  and  need 
not  superficially  resemble  the  phenomena  which  are  represented. 

3.Z.4  Adaptive  Processes  in  Perception 

Much  of  the  theoretical  work  on  brain  models  (Hebb,  Hayek, 
etc.)  has  been  concerned  with  processes  by  which  complex  perceptual 
organizations  can  be  "built  up"  out  of  sensory  fragments,  by  a  process 
of  learning  or  association.  Consequently,  the  question  of  adaptability, 
or  modifiability,  of  perception  is  of  paramount  importance  as  a  guide  in 
model  construction.  The  history  of  this  problem  has  recently  been 
reviewed  by  Hochberg  (Ref.  34).  Studies  of  "perceptual  learning"  have 
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been  concerned  (1)  with  the  organization  of  given  perceptual  elements 
into  "concepts”,  or  "kinds  of  objects",  and  (2)  with  the  modification  of  the 
perceptual  elements  or  "impressions"  themselves. 

(l)The  first  type  of  experiment  is  concerned  with  the  discrimi¬ 
nation,  rather  than  the  "appearance"  of  stimuli.  It  is  clear  that  much 
recognition  and  discrimination,  as  in  the  learning  of  speech  sounds  in  a 
new  language,  is  highly  dependent  upon  learning.  Such  processes  typically 
involve  differentiation,  rather  than  synthesis  of  complex  patterns  out  of 
readily  identified  parts.  Another,  important  part  of  perceptual  concept 
formation  is  concerned  with  associating,  or  classifying  readily  discrimin- 
able  patterns  or  symbols  having  the  same  significance  (such  as  a  Roman, 
italic,  and  script  form  for  the  letter  "A").  (2)  On  the  other  hand,  there 
are  a  number  of  studies  concerned  with  attempts  at  modifying  the  seemingly 
intrinsic  "appearance"  of  the  stimulus  itself.  Such  experiments  are  not 
concerned  with  refinements  in  discrimination  or  assignment  of  appropriate 
names  to  stimuli;  they  are  concerned  with  re-structuring  the  sensory  data 
at  a  considerably  more  "primitive"  level.  Such  experiments  include 
studies  of  figural  aftereffects  (Ref.  25),  ambiguous  figures  (Ref.  107) 
the  effect  of  memory  upon  color  perception  (Ref.  10),  and  the  various 
experiments  performed  with  inverting  prisms  to  determine  whether  a 
human  subject  could  learn  to  perceive  normally  with  an  inverted  retinal 
field.  Work  with  animals  reared  in  darkness  and  exposed  to  the  light 
for  the  first  time  in  various  test  situations  has  been  reported  by  Riesen 
(Ref.  75  )  and  Gibson  and  Walk  (Ref.  24)  have  conducted  experiments  with 
infants  and  newborn  animals  to  determine  whether  depth  perception  is 
possible  prior  to  learning.  Other  data  have  been  collected  by  von  Senden  for 
congenitally  blind  human  subjects  to  whom  sight  is  restored  by  surgery 
(Ref.  106). 
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In  general,  the  conclusions  of  this  work  seem  to  indicate  that 
while  recognition,  in  the  sense  of  being  able  to  discriminate  and  assign  an 
appropriate  name  to  an  object,  is  largely  dependent  upon  experience,  the 
"subjective  appearance"  of  a  stimulus  is  relatively  inflexible,  and  in  some 
species,  at  least,  may  be  innately  given  by  the  structure  of  the  nervous 
system.  Sperry's  work  with  frogs,  for  example,  in  which  the  optic  nerves 
are  cut  and  then  allowed  to  rejoin  with  the  eyeballs  inverted,  suggests  that 
no  amount  of  relearning  can  compensate  for  so  drastic  a  change  (Ref.  94) 
and  the  Gibson-Walk  experiments  support  the  assumption  of  a  highly 
developed  sense  of  depth  perception  in  many  mammals  from  birth.  To  a 
much  lesser  degree,  modification  of  visual  images  by  experience  is 
possible;  generally,  this  takes  the  form  of  persistent  field  interactions 
(as  in  figural  aftereffects)  rather  than  a  basic  reorganization  of  perceptual 
experience.  The  extent  to  which  perception  might  be  organized  by  adaptive 
processes  is  currently  unknown,  and  this  is  one  of  the  main  areas  in  which 
theoretical  brain  models  may  prove  helpful  to  psychology, 

3.Z.5  Influence  of  Motivation  on  Memory 


In  psychological  learning  theories,  it  is  commonly  assumed 
that  a  "drive"  or  "motive"  must  be  present  in  order  for  an  animal  to 
learn.  Conditioned  reflex  experiments,  on  the  other  hand,  frequently  fail 
to  show  any  relationship  between  the  "motivation  state"  of  the  animal  and 
the  learning  process.  Speculation  about  the  role  of  motivation  in  perceptual 
learning  has  also  been  quite  extensive,  and  a  number  of  experiments  have 
been  performed,  to  test  the  learning  of  perceptual  discriminations  or 
related  tasks  on  the  basis  of  "mere  repetition"  as  opposed  to  directed 
learning.  In  these  experiments,  it  is  often  hard  to  distinguish  between 
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"attention”  and  "motivation",  and  the  results  are  generally  inconclusive. 

It  seems  that  a  certain  amount  of  "incidental  learning"  does  indeed  occur, 
which  is  not  directly  relevant  to  the  goal  or  task  of  the  subject  at  the  time; 
the  actual  degree  of  motivation,  reward  or  punishment,  or  "reinforcement" 
that  may  have  been  involved,  however,  is  impossible  to  ascertain  in  any 
absolute  way.  For  the  brain  model  problem,  it  is  important  to  note  that 
there  are  some  learning  situations,  at  least,  in  which  "reward  and  punish¬ 
ment"  can  be  used  to  control  the  acquisition  of  new  responses;  whether  or 
not  this  is  universally  the  case,  and  the  actual  physiological  mechanisms 
involved,  remain  open  questions  at  this  time.  It  should  be  remembered, 
however,  that  any  brain  model  which  relies  on  the  intervention  of  an  outside 
agent  or  experimenter  to  direct  the  learning  process  is  implicitly  taking  a 
stand  on  this  issue.  A  possible  compromise  is  found  in  the  approach  of 
Ashby  (Ref.  3)  where  the  brain  is  described  as  a  complex  homeostatic 
organization,  in  which  particular  "crucial  variables"  are  capable  of 
triggering  random  changes  in  organization  if  they  exceed  critical  limits; 
stabilization  of  behavior,  in  such  a  system,  is  not  a  result  of  learning 
from  reward,  but  is  due  to  the  cessation  of  disruptive  changes  which  occur 
when  the  system  makes  a  mistake.  The  main  difficulty  in  making  use  of 
this  approach  is  in  guaranteeing  that  changes  are  sufficiently  specific  and 
well-directed  so  that  the  organism  achieves  its  new  behavior  pattern  in  an 
economical  and  relatively  direct  fashion,  rather  tlian  going  on  a  random 
walk  through  all  possible  alternatives  before  arriving  at  the  required 
solution . 
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3,2.6  The  Nature  of  Awareness  and  Cognitive  Systems 


While  it  has  been  relegated  by  many  theorists  of  the  realm  of 
philosophy  or  semantics  rather  than  science,  the  question  of  the  nature  of 
consciousness  or  awareness  keeps  recurring  in  the  literature.  Current 
physiologists  and  psychologists  represent  the  whole  range  of  philosophical 
positions  on  this  subject.  For  Eccles  (Ref.  18  )  there  is  a  conscious 
''mdnd''  which  controls  the  body  by  acting  upon  the  nervous  system.  For 
Penfield  and  Jasper,  awareness  is  a  state  of  the  nervous  system  involving 
heightened  sensitivity  and  improved  coordination,  under  the  control  of  the 
centrencephalic  system,  and  particularly  the  reticular  formation  (Ref.  38  ). 
John  (Ref.  39)  suggests  that  "awareness  may  be  a  property  arising  from 
the  process  of  'cortico -reticular  resonance' For  Culbertson  (Ref.  17), 
consciousness  is  a  property  of  trees  of  causal  relations  which  tie  together 
the  events  of  the  e.xternal  physical  world  and  the  neural  events  in  the 
brain.  Lotka  (Ref.  53)  has  suggested  that  we  look  to  the  world  of  molecular 
events  for  an  explanation,  and  that  consciousness  involves  particular 
unstable  states  of  molecular  or  atomic  particles. 

To  this  writer,  it  seems  likely  that  the  question  of  the  "nature 
of  awareness"  can  be  bypassed,  in  much  the  same  way  that  we  bypass  the 
question  of  the  "nature  of  perception",  by  concentrating  on  the  experimental 
and  psychological  criteria  which  may  be  used  to  distinguish  the  actual 
phenomena  in  question.  When  a  subject  reports  that  he  is  "conscious"  or 
that  he  was  recently  "unconscious",  we  are  led  to  believe  him  or  dis¬ 
believe  him  on  the  basis  of  his  behavior,  and  what  he  is  able  to  report 
about  the  content  of  his  "experience"  at  the  time  in  question.  From  an 
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operational  point  of  view,  the  fact  of  "consciousness”  is  closely  connected 
with  the  accessibility  of  information  and  its  ability  to  influence  overt 
behavior;  it  is,  in  fact,  meaningless  to  say  that  an  individual  is  "conscious" 
unless  there  is  something  that  he  is  conscious  of.  The  questions  which  can 
be  asked  concerning  this  phenomenon  in  a  theoretical  brain  model  (where  we 
are  not  free  to  assume  any  intrinsic  similarity  of  processes  to  those  in  the 
human  brain)  are  questions  of  what  can  be  discriminated,  "seen",  "attended 
to",  or  "remembered"  under  specified  conditions.  All  that  we  can  say, 
in  the  last  analysis,  is  that  the  system  acts  as  if  it  were  conscious,  leaving 
the  question  of  the  actual  existence  of  consciousness  in  the  system  for 
metaphysicists  to  consider. 

Systems  which  represent  information  internally,  in  such  a  way 
that  it  can  be  utilized  for  the  control  of  certain  kinds  of  responses  (such  as 
running,  thinking,  or  talking)  will  be  called  cognitive  with  respect  to  the 
realm  of  information  which  is  represented  and  the  class  of  responses  which 
this  information  controls.  Note  that  this  term  is  used  in  a  relative,  rather 
than  an  absolute  sense.  Thus  the  representation  of  information  in  the  form 
of  an  image  on  the  retina  is  not  sufficient  to  permit  us  to  say  whether  or 
not  the  organism  is  cognitive  with  respect  to  its  visual  environment;  we 
must  also  demonstrate  that  this  information  is  accessible  to  the  organism 
for  the  control  of  some  specified  set  of  responses.  We  might  say,  for 
example,  that  a  man  who  automatically  stops  for  a  red  light,  but  is 
unable  to  state  afterwards  why  he  stopped  is  cognitive  with  respect  to 
red  signals  at  the  level  of  overt  motor -responses ,  but  not  at  the  level 
of  verbal  recall.  Conversely,  an  unskilled  pianist  may  be  cognitive  with 
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respect  to  errors  in  his  performance  at  the  verbal  level,  but  not  at 
the  motor  control  level.  We  use  the  term  cognitive,  then,  to  indicate 
that  knowledge  of  some  realm  of  information  is  accessible  for  the  control 
of  some  specified  class  of  responses.  This  usage  permits  us  to  reserve 
judgement  on  the  definition  of  such  phenomena  as  perception  and  awareness, 
and  still  to  recognize  a  class  of  psychological  phenomena  involving  the 
accessibility  of  information,  with  which  we  shall  be  concerned. 

3.3.  Experimental  Tests  of  Performance 

The  purpose  of  a  theoretical  brain  model  is  to  demonstrate 
how  p  s  y  c  h  0  1  o  gical  phenomena  can  arise  from  a  physical  system  of 
known  structure  and  functional  properties.  In  the  preceding  sections  of 
this  chapter,  we  have  reviewed  the  physiological  data  which  suggest  the 
general  form  of  the  model,  and  the  psychological  data  against  which  its 
performance  must  be  measured.  We  now  turn  to  a  more  specific  consi¬ 
deration  of  the  psychological  tests  which  might  be  applied  to  a  brain  model 
in  order  to  evaluate  its  performance,  and  to  compare  alternative  systems 
with  one  another. 

3.3.1  Discrimination  Experiments 

In  the  simplest  type  of  experiment  which  can  yield  psycholo¬ 
gically  significant  information  about  a  system,  two  distinct  stimuli  are 
presented  to  the  model,  which  is  required  to  respond  differentially  to 
them.  In  the  general  case,  it  is  not  necessary  to  limit  this  experiment 
to  two  specific  stimuli  or  sensory  patterns;  two  or  more  classes  of 
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patterns  may  be  employed,  each  class  consisting  of  "similar"  patterns, 
such  as  squares,  or  triangles,  or  various  sizes  and  styles  of  the  letter  "A". 
This  experiment  may  be  performed  either  to  look  for  spontaneous  discrimi- 
nation  by  the  system,  in  the  absence  of  intervention  or  guidance  by  the 
experimenter,  or  to  study  forced  discrimination  in  which  the  experimenter 
attempts  to  teach  the  system  to  make  the  required  distinctions.  In  a 
learning  experiment,  a  perceptron  is  typically  exposed  to  a  sequence  of 
patterns  containing  representatives  of  each  type  or  class  which  is  to  be 
distinguished,  and  the  appropriate  choice  of  response  is  "reinforced" 
according  to  some  rule  for  memory  modification.  The  perceptron  is  then 
presented  with  a  test  stimulus,  and  the  probability  of  giving  the  appropriate 
response  for  the  class  of  the  stimulus  is  ascertained.  Different  results  will 
be  obtained,  depending  on  whether  or  not  the  test  stimulus  is  chosen  to 
correspond  identically  to  one  of  the  patterns  which  were  used  in  the 
training  sequence.  If  the  test  stimulus  is  not  identical  to  any  of  the  training 
stimuli,  the  experiment  is  not  testing  "pure  discrimination",  but  involves 
generalization  as  well.  If  the  test  stimulus  activates  a  set  of  sensory 
elements  which  are  entirely  distinct  from  those  which  were  activated  in 
previous  exposures  to  stimuli  of  the  same  class,  the  experiment  is  a  test 
of  "pure  generalization".  The  simplest  of  perceptrons,  which  will  be 
considered  initially,  have  no  capability  for  pure  generalization,  but  can 
be  shown  to  perform  quite  respectably  in  discrimination  experiments 
particularly  if  the  test  stimulus  is  nearly  identical  to  one  of  the  patterns 
previously  experienced. 
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3.3.2  Generalization  Experiments 


As  indicated  above,  a  pure  generalization  experiment  is  one 
in  which  the  brain  model,  or  perceptron,  is  required  to  transfer  a  selective 
response  from  one  stimulus  (say,  a  square  on  the  left  side  of  the  retina) 
to  a  "similar"  stimulus  which  activates  none  of  the  same  sensory  points 
(a  square  on  the  right  side  of  the  retina).  Generalization  of  a  weaker  sort 
may  be  demonstrated  if  we  simply  require  the  system  to  transfer  a 
response  to  members  of  a  class  of  similar  stimuli,  which  are  not  necessarily 
disjoint  from  the  one  which  has  been  seen  (or  heard  or  felt)  before.  As  in 
the  case  of  discrimination  experiments,  it  is  possible  to  study  either 
spontaneous  generalization,  in  which  the  criteria  for  similarity  are  not 
supplied  by  an  outside  agency  or  experimenter,  or  forced  generalization, 
in  which  the  experimenter's  concept  of  similarity  is  "taught"  by  means  of 
a  suitable  training  procedure.  Some  of  the  most  significant  problems  in 
brain  mechanisms  concern  generalization  phenomena,  and  particularly 
the  meaning  of  "similarity"  for  a  particular  kind  of  system.  In  common 
with  a  number  of  other  theorists  (e.g.,  Pitts  and  McCulloch,  Ref.  71), 
this  writer  will  assume  that  similarity  is  primarily  determined  by  a 
group  of  transformations  which  stimuli  may  undergo  in  a  particular 
physical  environment.  In  the  normal  physical  environment,  for  visual 
stimuli,  this  would  include  rigid  motions,  rotations,  size  changes, 
projective  transformations,  certain  types  of  distortions  or  continuous 
deformations,  and  changes  in  color  or  contrast.  A  number  of  more 
subtle  forms  of  similarity  (as  in  styles  of  architecture,  gestures  and 
mannerisms,  etc.)  are  presumably  due  to  association  of  events  into 
classes  at  a  higher  level  of  organization  than  we  are  concerned  with  at 
this  point.  It  should  be  noted,  however,  that  a  perceptron  which  is  taught 


-69- 


to  form  arbitrary  classes  of  stimuli  might  be  expected  to  generalize 
along  completely  arbitrary  or  abstract  dimensions,  "similarity  of  style" 
being  as  legitimate  a  candidate  for  a  basis  of  classification  as  "similarity 
of  shape".  In  the  simple  perceptrons,  we  will  find  that  "pure  generalization" 
does  not  occur,  although  an  apparent  generalization  of  responses  to  stimuli 
which  share  many  sensory  points  with  those  previously  experienced  can  be 
demonstrated.  In  this  report,  this  weak  form  of  generalization  will  be 
considered  under  "discrimination  phenomena",  the  term  "generalization" 
being  reserved  primarily  for  cases  in  which  mechanism  for  recognizing 
actual  similarity,  rather  than  a  rough  approximation  to  identity,  is  involved. 

3.3.3  Figure  Detection  Experiments 


In  the  experiments  considered  above,  two  or  more  kinds  of 
stimuli  are  alv.'ays  employed,  in  order  to  avoid  the  trivial  case  in  which 
the  desired  response  is  automatically  evoked  by  any  stimulus  that  might 
occur.  Since  it  is  assumed  that  at  each  moment  of  time  exactly  one 
stimulus  is  present,  these  experiments  represent  a  "forced  choice" 
situation,  in  which  the  brain  model  is  obliged  to  give  one  of  several 
positive  identifications  in  response  to  whatever  it  "sees".  Such  experi¬ 
ments  have  their  counterparts  in  animal  and  human  experimentation, 
and  permit  the  study  of  an  important  class  of  psychological  problems, 
involving  simply  structured  situations.  An  alternative  approach,  which 
has  been  less  studied  to  date,  is  to  give  the  system  the  task  of  searching 
for  a  particular  figure  in  a  sensory  field  which  may  or  may  not  contain  it. 
In  this  case,  the  system  is  asked  to  discriminate  between  "figure  present" 
and  "figure  absent",  and  is  typically  only  instructed  in  the  recognition  of 
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one  figure  at  a  time.  If  the  figure  appears  as  a  solitary  object  in  an 
otherwise  empty  field,  the  task  is  a  relatively  trivial  one.  If  the  figure 
appears  against  a  background,  or  as  part  of  a  con.plex  of  other  patterns, 
the  problem  takes  on  a  new  aspect  of  complexity.  In  the  most  important 
case,  this  experiment  permits  us  to  study  figure -ground  organizing 
tendencies  in  a  perceptron,  by  presenting  it  with  embedded,  or  ambiguous 
figures  which  can  be  recognized  as  representing  one  thing  if  the  field  is 
appropriately  structured,  and  a  different  thing  if  the  field  is  structured 
differently.  The  Gestalt  properties  of  "good  figure"  are  supposed  to 
determine  the  preference  of  a  human  observer  to  perceive  one  or  another 
of  the  possible  figures  in  such  a  field.  Detection  experiments  per  mit  us 
to  compare  the  preferences  and  rules  of  "good  figure"  in  a  perceptron 
with  those  of  human  subjects,  in  controlled  situations.  Perceptrons 
considered  to  date  show  little  resemblance  to  human  subjects  in  their 
figure -detection  capabilities,  and  gestalt-organizing  tendencies.  In  Part  IV 
of  this  report,  some  speculations  concerning  the  development  of  such 
properties  in  more  sophisticated  perceptrons  will  be  presented. 

3.3.4  Quantitative  Judgement  Experiments 


Another  type  of  experiment  with  which  little  work  has  been 
done  to  date  involves  the  estimation  of  quantitative  properties  of  stimuli 
(size,  distance,  position ,  etc  .)  by  perceptrons  .  It  will  be  seen  that  simple 
perceptrons  are  capable  of  learning  to  represent  stimuli  by  a  continuously 
variable  "analog"  type  of  response.  No  work  has  been  done  to  date,  however 
to  investigate  such  questions  as  the  generalization  of  quantitative  judgement 
to  new  stimuli,  or  the  accuracy  which  can  be  achieved  in  specific  cases. 


For  more  advanced  systems,  an  important  problem  which  must  ultimately 
be  faced  is  that  of  "perceptual  constancies";  the  tendency  in  human  subjects 
to  perceive  size,  color,  or  other  metric  properties  of  a  stimulus  in  terms 
of  the  "actual"  physical  properties  of  the  object  rather  than  its  projection 
on  the  retina.  A  man,  for  example,  is  perceived  to  be  about  six  feet  tall 
regardless  of  whether  his  retinal  image  subtends  one  degree  or  fifteen 
degrees,  and  a  dish  appears  to  be  circular  in  form  regardless  of  whether 
its  retinal  image  is  a  true  circle  or  an  elongated  ellipse.  It  has  been 
demonstrated  in  many  psychological  experiments  that  such  phenomena 
are  not  based  simply  on  familiarity  with  the  particular  objects  involved; 
a  completely  unfamiliar  form,  seen  in  normal  physical  space,  is  perceived 
correctly,  in  terms  of  its  "true"  physical  properties,  except  under 
exceptional  circumstances  (c.f.  Gibson,  Ref.  Zb). 

3.3.5  Sequence  Recognition  Experiments 


In  the  above  experiments,  it  has  been  assumed  that  the  stimuli 
are  fixed,  temporally  invariant  patterns.  Analogous  problems  exist, 
involving  discrimination,  generalization,  figure  detection,  and  metric 
estimation  for  time -varying ,  or  sequential  patterns  of  all  sorts.  While 
static  organization  problems  reach  their  greatest  degree  of  complexity 
in  the  visual  modality,  temporal  organization. becomes  comparably 
complex  in  the  auditory  field.  Speech  recognition  is  one  particularly 
important  case  to  be  investigated.  Problems  include  not  only  the 
recognition  of  particular  movements,  or  sequences,  but  the  segmentation 
of  movement  and  sound  patterns  into  figural  units, words,  or  phrases  as 
well.  The  recognition  of  sequences  in  rudimentary  form  is  well  within  the 
capability  of  suitably  organized  perceptrons,  but  the  problem  of  figural 
organization  and  segmentation  presents  problems  which  are  just  as  serious 
here  as  in  the  case  of  static  pattern  perception. 
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3.3.6  Relation  Recognition  Experiments 

In  a  simple  perceptron,  patterns  are  recognized  before 
"relations";  indeed,  abstract  relations,  such  as  "  A  is  above  B"  or  "the 
triangle  is  inside  the  circle"  are  never  abstracted  as  such,  but  can  only 
be  acquired  by  means  of  a  sort  of  exhaustive  rote -learning  procedure,  in 
which  every  case  in  which  the  relation  holds  is  taught  to  the  perceptron 
individually.  At  the  present  time,  the  main  hope  for  the  abstraction  of 
relations  seems  to  lie  in  systems  which  are  capable  of  executing  a 
sequence  of  observations,  according  to  a  predetermined  plan,  in  which 
first  one  member  of  the  related  pair  is  observed  and  then  the  other,  the 
relationship  between  them  being  determined  by  the  sequence  of  "experience" 
during  the  shift  of  attention  from  the  first  to  the  second.  The  problem  of 
relation  recognition  is,  at  the  outset,  more  complex  than  those  previously 
considered,  since  it  requires,  by  its  very  nature,  the  ability  to  recognize 
and  attend  selectively  to  at  least  two  distinct  "parts"  of  a  total  organization, 
specifying,  for  example,  which  part  is  larger  and  which  smaller,  or  .which 
part  is  "outside"  and  which  "inside".  The  hypothesis  that  relation  recogni¬ 
tion  involves  a  sequence,  or  program, of  observation  means  that  it  must 
make  use  not  only  of  figure  organization  capabilities  (to  separate  the 
"parts"  referred  to)  but  of  sequence  recognition  and  sequential  control 
capabilities  as  well.  The  actual  experiments  by  which  relation  recognition 
can  be  detected  must  involve  at  least  two  components  (such  as  square  and 
triangle)  which  can  be  shown  in  such  a  way  as  to  exemplify  the  relationship 
or  not.  In  an  ideal  experiment,  the  system  would  be  trained  to  recognize 
the  relation  by  a  number  of  examples  with  stimulus  patterns  or  "parts" 
which  do  not  resemble  or  intersect  (in  their  retinal  location)  the  test 


-73- 


patterns  which  are  employed  in  evaluating  the  performance.  If  the  perceptron 
can  then  indicate  correctly,  for  entirely  new  stimuli,  whether  or  not  the 
relation  holds,  it  will  be  considered  that  the  relation  has  been  abstracted 
by  the  system. 

3.3.7  Program-Learning  Experiments 


The  learning  of  sequences  of  behavior  is  the  counterpart  on  the 
response  side  of  the  problem  of  sequence  recognition.  The  problem  has 
been  discussed  in  detail  by  Lashley  (Ref.  50).  It  requires,  as  a  starting 
point,  the  ability  to  form  "selective  sets",  which  introduce  a  bias  to  give 
one  of  several  alternative  responses  to  a  givem  stimulus.  A  capability  of 
this  sort  has  been  shown  to  exist,  to  some  degree,  in  relatively  simple 
perceptrons,  provided  there  is  a  feedback  path  from  the  response  units  to 
the  association  system  (Ref.  79).  To  date,  little  has  been  done  to  study  this 
capability  in  a  quantitative  fashion,  but  some  of  the  heuristic  arguments  will 
be  reviewed  in  Chapter  23.  One  of  the  most  important  applications  of  such 
a  capability  is  in  the  control  of  the  sequential  activity  involved  in  recognition 
of  relations,  and  the  "perceptual  exploration"  of  a  sensory  field.  Related 
phenomena,  in  which  this  capability  plays  a  central  part,  are  the  sequential 
control  of  speech,  thinking,  and  complex  behavior  patterns.  The  represen¬ 
tation  of  problem  solving  activity  in  the  human  by  heuristic  programs  has 
been  studied  by  Newell,  Shaw,  and  Simon  (Refs.  62,  63),  and  it  seems 
likely  that  many  of  their  results  might  be  transferred  to  a  perceptron 
which  is  capable  of  program  controlled  activity. 
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3.3.8  Selective  Recall  Experiments 


While  most  of  the  experiments  described  above  involve  "memory” 
in  the  sense  of  a  change  in  behavior  as  a  consequence  of  experience,  they  do 
not,  in  general,  require  substantive  recall,  of  the  sort  which  is  displayed 
when  we  describe  a  person  who  we  saw  yesterday,  or  the  location  of  furni¬ 
ture  in  a  house  where  we  lived  last  year.  In  selective  recall  experiments, 
the  system  is  required  to  produce  on  demand  information  relevant  to  a 
particular  time,  place,  or  subject.  This  involves  a  particular  case  of 
"selective  set"  mechanisms,  and  can  probably  be  demonstrated  in  most 
systems  which  are  capable  of  program-controlled  behavior. 

3.3.9  Other  Types  of  Experiments 

In  addition  to  the  experiments  considered  above,  we  might 
ultimately  wish  to  consider  experiments  in  abstract  concept  formation, 
the  formation  and  properties  of  a  "self  concept",  creative  imagery,  and 
other  higher-order  psychological  phenomena.  At  the  present  time,  these 
problems  seem  sufficiently  remote  from  the  capabilities  of  present 
perceptrons  that  we  need  not  consider  them  further  here.  Also  relegated 
to  the  future  is  the  consideration  of  such  psychological  phenomena  as 
perceptual  illusions,  figural  aftereffects,  and  related  phenomena,  even 
though  these  have  been  considered  primary  in  some  of  the  brain  models 
hitherto  advanced.  It  is  this  writer's  belief  that  these  phenomena  are  so 
likely  to  depend  on  inessential  details  of  brain  organization,  at  almost  any 
level  of  complexity,  that  it  would  be  a  mistake  to  try  to  rest  the  case  for 
or  against  a  particular  model  on  a  demonstration  that  it  can  duplicate  a 
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particular  kinds  of  perceptual  illusion.  It  seems  more  important,  at  this 
stage,  to  account  for  "veridical  perception"  than  for  its  occasional  failures, 
particularly  since  these  are  currently  demonstrable  in  a  single  species  only, 
and  may  lack  any  generality  whatsoever, 

3.3.10  Application  of  Experimental  Designs  to  Perceptrons 


The  designs  considered  above  have  been  discussed  as  if  they 
were  actual  "flesh  and  blood"  experiments,  performed  with  real  physical 
systems.  In  the  study  of  perceptrons,  it  is  not  always  practical  or  necessary 
to  carry  out  such  experiments  in  reality;  the  important  thing  is  that  an  analysis 
of  a  given  model  should  always  be  carried  out  in  terms  of  an  experimental 
design  which  is  specified  in  sufficient  detail  so  that  it  could  be  carried  out 
if  the  system  were  actually  constructed. 

In  practise,  three  main  methods  are  employed  in  the  study  of 
perceptrons ; 


(1)  Mathematical  analysis,  in  which  a  stimulus  environment, 
the  rules  for  stimulus  presentation  and  for  the  modification  of  the  perceptron's 
memory  state  are  clearly  specified.  The  object  of  such  analysis  is,  in 
general,  to  determine  the  probability  of  correct  performance,  or  the  proba¬ 
bility  of  achieving  a  given  performance  criterion,  for  a  specified  class  of 
systems  . 


(Z)  Digital  simulation,  in  which  the  perceptron,  its  environment, 
and  the  memory  modification  rules  are  all  represented  in  a  digital  computer 
program,  which  carries  out  the  required  operations  of  an  experiment  in 
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step-by-step  fashion,  calculating  the  response  of  every  neuron  and  connection 
in  the  perceptron,  and  measures  the  performance  of  the  system.  Such  a 
program,  repeated  for  a  sufficient  sample  of  perceptrons  in  a  class,  yields 
much  the  same  type  of  information  as  is  obtained  from  a  mathematical 
analysis.  It  has  the  advantage  of  being  free  from  all  approximations  (which 
may  be  necessary  in  some  analyses)  but  is  less  likely  to  yield  important 
insights  into  the  lawful  relations  which  characterize  a  class  of  systems. 
Simulation  programs  are  most  valuable  as  an  exploratory  device,  and  for 
the  study  of  systems  of  such  complexity  that  an  exact  mathematical  analysis 
is  impossible  . 

(3)  Study  of  physical  models,  involving  the  actual  construction 
of  a  hardware  device,  and  the  performance  of  the  indicated  experiments.  At 
present,  little  is  to  be  gained  from  the  study  of  actual  physical  models  which 
cannot  be  learned  from  the  other  two  methods,  but  as  successive  models  grow 
in  size  and  complexity,  and  as  means  are  found  for  the  inexpensive  construction 
of  electronic  models,  this  method  becomes  increasingly  important.  Its  main 
virtue  is  the  flexibility  and  adaptability  of  a  hardware  perceptron  to  new  types 
of  learning  experiments  and  procedures,  and  the  ability  to  use  ordinary 
physical  objects  and  environments  as  stimuli,  which  would  otherwise  involve 
a  great  deal  of  time  and  expense  in  computer  programming.  The  physical 
model  itself,  however,  is  apt  to  be  less  flexible  than  a  simulated  system, 
and  is  best  suited  for  "case  studies"  of  a  single  representative  system, 
rather  than  statistical  studies  of  a  class  of  systems. 

In  most  of  the  experiments  considered  in  this  report,  (which 
are  listed  in  Appendix  D)  human  performance  capabilities  are  sufficiently 
well  known  to  permit  us  to  draw  conclusions  about  possible  comparisons 
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between  perceptrons  and  biological  systems  without  further  study.  In 
some  of  the  proposed  experiments,  however,  (e.g.,  the  figure  organization 
experiments  described  in  3.3.3)  additional  data  may  be  required  on  human 
performance  in  order  to  obtain  a  base-line  for  the  quantitative  evaluation  of 
perceptrons.  Thus  it  seems  likely  that  in  the  near  future,  a  program  in 
experimental  psychology  with  human  and  animal  subjects  may  be  a  necessary 
adjunct  to  the  evaluation  of  our  brain  models.  When  this  occurs,  the  models 
are,  in  effect,  being  used  as  predictive  devices,  capable  of  generating  data 
(probably  grossly  inaccurate  at  the  outset)  which  have  not  yet  been  actually 
observed  in  human  subjects.  The  ultimate  test  for  a  brain  model,  from  the 
standpoint  of  psychological  validity,  is  an  experiment  of  this  type,  in  which 
the  model  correctly  predicts  phenomena  which  have  yet  to  be  discovered  in 
biological  systems. 
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4. 


BASIC  DEFINITIONS  AND  CONCEPTS 


This  chapter  is  devoted  to  basic  definitions  of  terms  which  will 
be  used  throughout  the  report.  It  is  recommended  that  the  reader  familiarize 
himself  with  this  terminology  in  a  general  way,  on  first  reading,  and  refer 
back  to  this  chapter  when  the  terms  are  reintroduced  in  the  subsequent  text. 

A  list  of  standard  symbols  will  also  be  found  in  Appendix  A. 

4.1  Signals  and  Signal  Transmission  Networks 


The  following  definitions,  which  are  not  specific  to  perceptrons, 
are  likely  to  be  helpful: 

DEFINITION  1:  A  signal  may  be  any  measurable  variable,  such  as  a 

voltage,  current,  light  intensity,  or  chemical  concentration. 
A  signal  is  typically  characterized  by  its  amplitude,  time, 
and  location . 


DEFINITION  Z:  A  signal  generating  unit  is  any  physical  element,  or  device, 
capable  of  emitting  a  signal.  The  output  signal  of  the  unit 
(/;  will  be  represented  by  the  symbol  U-l 

DEFINITION  3:  A  signal  generating  function  is  any  function  which  defines 
the  amplitude  of  the  sigi,al  emitted  by  a  signal  generating 
unit . 
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DEFINITION  4:  A  connection  is  any  channel  (e.g,,  a  wire  or  nerve  fiber) 

by  which  a  signal  emitted  by  one  signal  generating  unit 

(the  origin)  may  be  transmitted  to  another  (the  terminus). 

A  connection  £.•  ■  is  characterized  by  its  origin  and 

terminal  units  (  ui  and  ixj  ,  respectively),  and  by  a 

transmission  function  which  determines  the  amplitude 

of  the  signal  induced  at  the  terminus  as  a  function  of  the 

amplitude  and  time  of  the  signal  generated  by  the  origin 

¥■ 

unit.  This  signal  will  be  sym.bolized  by  r-jftj. 

DEFINITION  5:  A  signal  transmission  network  is  a  system  of  signal  generating 
units,  linked  by  connections. 

4.Z  Elementary  Units,  Signals,  and  States  in  a  Perceptron 


A  perceptron  (which  will  be  defined  in  the  next  section)  is  a 
signal  transmission  network  containing  three  types  of  signal  generating 
units:  sensory  units,  association  units,  and  response  units.  These  units 
all  have  signal  generating  functions  which  depend  on  signals  originating 
elsewhere  in  the  network,  or  else  externally,  in  an  outside  environment. 
The  signals  upon  which  the  generating  function  of  a  unit  depends  are  called 


❖ 

In  previous  reports,  the  term  "transfer  function"  has  been  used  for 
this  characteristic.  Since  "transfer  function"  has  a  somewhat  different 
meaning  in  control  system  theory  and  elsewhere,  it  is  avoided  here,  and 
the  term  "transmission  function"  is  preferred. 


-80- 


the  input  signals  to  that  unit.  These  units  are  defined  here  in  a  sufficiently 
general  manner  as  to  include  biological  neurons  as  a  special  case.  We  shall 
be  chiefly  concerned,  however,  with  models  which  employ  simplified  versions 
of  such  neurons . 

DEFINITION  6:  A  sensory  unit  (S-unit)  is  any  transducer  responding  to 
physical  energy  (e.g.,  light,  sound,  pressure,  heat, 
radio  signals,  etc.)  by  emitting  a  signal  which  is  some 
function  of  the  input  energy.  The  input  signal  at  time  t 
to  an  S-unit  A-i  from  the  environment,  W,  is  symbolized 
signal  which  is  generated  by  at  time 

t  is  symbolized  A-  [t)  • 

DEFINITION  7:  A  simple  S-unit  is  an  S-unit  which  generates  an  output 
signal  =  +  !  if  its  input  signal,  exceeds  a 

given  threshold,  ,  and  0  otherwise. 

DEFINITION  8:  An  association  unit  (A -unit)  is  a  signal  generating  unit 
(typically  a  logical  decision  element)  having  input  and 
output  connections.  An  A-unit  cij  responds  to  the 
sequence  of  previous  signals  c received  by  way  of 
input  connections  C;  j  ,  by  emitting  a  signal  Q.  ;  (tj  . 

DEFINITION  9;  A  simple  A-unit  is  a  logical  decision  element,  which 
generates  an  output  signal  if  the  algebraic  sum  of  its 
input  signals,  ,  is  equal  or  greater  than  a  threshold 

quantity,  9  >  O  .  The  output  signal  O-l  is  equal  to  +  J 
if  CA '  9  and  0  otherwise.  If  n  ~  !  ,  the  unit 

is  said  to  be  active. 
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DEFINITION  10:  A  response  unit  (R-unit)  is  a  signal  generating  unit 

having  input  connections,  and  emitting  a  signal  which  is 

transmitted  outside  the  network  (i.e,  ,  to  the  environment, 

or  external  system).  The  emitted  signal  from  unit  fi 

*■ 

will  be  symbolized  by  . 

DEFINITION  1 1  :A  simple  R-unit  is  an  R-unit  which  emits  the  output 
r  -V-/  if  the  sum  of  its  input  signals  is  strictly 
positive,  and  f'  =  ~  !  if  the  sum  of  its  input  signals 
is  strictly  negative.  If  the  sum  of  the  inputs  is  zero, 
the  output  can  be  considered  to  be  equal  to  zero  or 
indeterminate.  (A  physical  unit  which  oscillates  in 
response  to  a  zero  signal  would  have  the  required 
properties .  ) 

DEFINITION  12;Transmission  functions  of  connections  in  a  perceptron 
depend  on  two  parameters:  the  transmission  time  of  the 
connection,  Tf  ■  ,  and  the  coupling  coefficient  or  value 

of  the  connection,  'y-j  .  The  transmission  function  of 
a  connection  C^j  from  ui  to  uj  is  of  the  form: 

^  Vij(t),  u[(t-Vij)  .  Values  maybe 
fixed  or  variable  (depending  on  time).  In  the  latter 
case,  the  value  is  a  memory  function. 

DEFINITION  13:The  activity  state  of  the  network  at  time  t  is  defined 
by  the  set  of  signals,  //j  ,  emitted  by  all  signal 

generating  units  at  time  t 
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DEFINITION  14: 


DEFINITION  15: 


DEFINITION  16: 


4.3  Definition 


DEFINITION  17: 


The  memory  state  of  a  network  is  the  configuration  of 
values  associated  with  all  (variable  valued)  connections 
at  a  specified  time. 

The  phase  space  of  a  network  is  the  space  of  all  possible 
memory  states,  for  a  given  network.  In  general,  if  there 
are  N  variable-valued  connections  in  the  network,  the  phase 
space  may  be  represented  by  a  region  in  Euclidean  N-space, 
each  coordinate  corresponding  to  the  value  of  one  connection. 
The  memory  state  of  the  system  at  any  specified  time  can 
be  characterized  by  a  point  in  this  phase  space,  and  the 
history  of  the  system  by  a  directed  line,  or  path,  followed 
by  this  point . 

The  interaction  matrix  for  a  network  of  S,  A,  and  R  units 
is  the  matrix  of  coupling  coefficients,  ''r-j  ,  for  all  pairs 

of  units,  ./•  and  /  •  .  If  there  is  no  connection  from 

'  j 

/•  to  ,  'T'l  ■  is  defined  as  zero.  Specifying  an 

interaction  matrix  is  equivalent  to  specifying  a  point  in 
the  phase  space . 

and  Classification  of  Perceptrons 

A  perccptron  is  a  network  of  S,  A,  and  R  units  with  a 
variable  interaction  matrix  V  which  depends  on  the 
sequence  of  past  activity  states  of  the  network. 
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DEFINITION  18:  The  logical  distance  from  unit  Ui  to  uuj  is  equal  to  the 
number  of  connections  in  the  shortest  path  by  which  a 
signal  can  be  transmitted  from  to  uj  . 

DEFINITION  19;  A  series -coupled  perceptron  is  a  system  in  which  all 

connections  originating  from  units  at  logical  distance  d 
from  the  closest  S  -unit  terminate  on  units  at  logical 
distance  d-t-l  from  the  closest  S  -unit. 

DEFINITION  20:  A  cross -coupleTd  perceptron  is  a  system  in  which  some 

connections  join  units  of  the  same  type  (S  ,  A  or  R  ) 
which  are  at  the  same  logical  distance  from  S  -units, 
all  other  connections  being  of  the  series -coupled  type, 

DEFINITION  21:  A  back-coupled  perceptron  is  a  system  in  which  at  least 
one  A  or  R  unit  at  a  distance  r/^  from  the  closest 
S  -unit  is  the  origin  of  a  connection  back  to  an  S  -unit 
or  to  an  A  -unit  at  a  distance  rj ■,  <  c/^  from  the  closest 
S  -unit;  i.e.  ,  this  is  a  system  with  feedback  paths  from 
units  located  near  the  output  end  of  the  system  to  units 
closer  to  the  sensory  end. 

It  should  be  noted  that  the  above  definitions  are  not  exhaustive; 
they  are  intended  to  designate  certain  gene'"ic  classes  of  perceptrons  with 
which  we  shall  be  concerned.  The  initial  models  to  be  considered  are  of  the 
type  specified  by  the  following  definitions: 


DEFINITION  22:  A  simple  perceptron  is  any  perceptron  satisfying  the 

following  five  conditions: 

1 .  There  is  only  one  R  -unit,  with  a  connection 
from  every  A  -unit. 

2.  The  perceptron  is  series -coupled,  with  connections 
only  from  S  -units  to  A  -units,  and  from  A  -units 
to  the  R  -unit. 

3.  The  values  of  all  sensory  to  A  -unit  connections 
are  fixed  (do  not  change  with  time). 

4.  The  transmission  time  of  every  connection  is 
either  zero  or  equal  to  a  fixed  constant,  T 

5.  All  signal  generating  functions  of  S  ,  A  ,  and  R 

units  are  of  the  form  a;  (t )  =  (tl)  ,  where 

is  the  algebraic  sum  of  all  input  signals 
arriving  simultaneously  at  the  unit 

DEFINITION  23:  An  elementary  perceptron  is  a  simple  perceptron  with 

simple  R-  and  A  -  units,  and  with  transmis sion  functions 

^  i 

of  the  form  £['  )  -  u  i  ( f.  -T)vij{t) . 

Perceptrons  can  be  represented  graphically  in  several  different 
ways.  In  particular,  frequent  use  is  made  of  three  types  of  diagrams,  which 
will  be  called  network  diagrams,  set  diagrams,  and  symbolic  diagrams. 
Depending  upon  the  level  of  specificity  required,  any  one  of  these  diagrams 
may  be  used  to  represent  the  same  system.  The  three  types  of  diagrams 
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are  illustrated  in  Figure  2.  The  network  diagram  shows  each  connection 
and  signal  unit  individually;  the  arrows  indicate  the  direction  of  signal 
transmission  through  the  connections.  The  set  diagram  represents  all 
S-units  as  a  single  set,  connected  to  the  set  of  A -units  for  association 
system)  which  is  represented  by  a  Venn  diagram,  the  subsets  of  which 
are  connected  to  different  R-units.  Set  diagrams  of  this  general  type  are 
found  to  be  particularly  useful  as  an  aid  to  analysis.  The  symbolic  diagram 
for  this  same  perceptron  merely  indicates  the  kinds  of  connections  which 
exist,  namely,  S  to  A,  A  to  R,  and  S  to  S.  The  perceptron  illustrated 
would  be  called  a  three-layer  perceptron,  cross-coupled  at  the  sensory 
layer . 


Figure  2  PERCEPTRON  DIAGRAMS 
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4.4  Stimuli  and  Environments 


DEFINITION  24: 


DEFINITION  25: 


DEFmiTION  26: 


4.5  Response 


DEFINITION:27; 


A  stimulus  is  any  non-zero  set  of  input  signals,  , 

to  the  S  -units  at  time  t  .  If  there  are  sensory 
units  in  the  retina,  then  a  stimulus  can  be  characterized  by 
a  vector  of  elements,  representing  the  signal  to  each 

S  -unit  as  an  element  of  the  vector.  The  condition  in 
which  all  input  signals  are  equal  to  zero  is  not  considered 
a  stimulus  unless  otherwise  specified. 

A  stimulus  world  (or  environment  )  is  any  set  of  stimuli, 
defined  for  a  specified  S-unit  set.  The  stimulus  world 
will  be  symbolized  by  W .  The  number  of  different  stimuli 
will  usually  be  denoted  by  n 

A  stimulus -sequence  world  (or  stimulus -sequence 
environment)  is  any  set  of  stimulus  sequences,  each 
consisting  of  an  ordered  series  of  stimuli  from  the  set  IV  . 
(For  example,  if  the  image  of  a  printed  word  is  a  stimulus, 
and  W  consists  of  all  words  in  a  dictionary,  then  the 
set  of  all  English  sentences  would  comprise  a  stimulus - 
sequence  world.) 

Functions  and  Solutions 


A  response  function  is  any  assignment  of  R  -unit  output 
signals  to  stimuli  in  14/  .  For  a  simple  perceptron,  the 

response  function  l^{W)  is  a  vector  of  n  elements, 
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(  )  indicating  the  value  of  the 

response  for  each  of  the  stimuli,  5/  ,  5^  ,  •  •  • ,  5^  in 
the  environment. 

DEFINITION  Z8:  A  classification  is  an  equivalence  class  of  response 

functions.  Two  response  functions  are  considered 
equivalent  if  their  corresponding  elements  agree  in 
sign.  For  any  perceptron  with  one  simple  R  -unit,  a 
classification,  C(W)  ,  divides  W  into  two  classes: 
a  positive  class  consisting  of  all  stimuli  for  which  r  =  +  t  , 
and  a  negative  class,  consisting  of  those  stimuli  for  which 


DEFINITION  29:  A  response -sequence  function  is  an  assignment  of  sequences 

of  R  -unit  output  signals  to  stimulus  sequences  in  a 
stimulus -sequence  world.  This  is  a  generalization  of  the 
concept  of  a  response  function  to  include  a  time  dimension. 

DEFINITION  30:  A  solution  to  a  response  function  (or  classification)  is  said 

to  exist  for  a  given  perceptron  if  there  is  a  point  in  the 
phase  space  of  the  perceptron  such  that  the  response 
(specified  by  the  function)  will  occur  if  the  stimulus  Si 
is  shown,  for  all  Pi  in  f/  . 

4.6  Reinforcement  Systems 

DEFINITION  31:  A  reinforcement  system  is  any  set  of  rules  by  which 

the  interaction  matrix  (or  memory  state)  of  a  per¬ 
ceptron  may  be  altered  through  time. 
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DEFINITION  32: 


DEFINITION  33: 


DEFINITION  34: 


DEFINITION  35: 


DEFINITION  36: 


A  reinforcement  control  system  is  any  system  or 
mechanism  external  to  a  perceptron  which  is  capable 
of  altering  the  interaction  matrix  of  the  perceptron  in 
accordance  with  the  rules  of  a  specified  reinforcement 
system. 

Positive  reinforcement  is  a  reinforcement  process  in 
which  a  connection  from  an  active  unit  U-i  which 
terminates  on  a  unit  n  •  has  its  value  changed  by  a 
quantity  Av-j  (t)  (or  at  a  rate  dxr-j  / dt  )  which 

agrees  in  sign  with  the  signal  (j j  ( t) 

Negative  reinforcement  is  a  reinforcement  process  in 
which  a  connection  from  an  active  unit  u  which 
terminates  on  a  unit  uj  has  its  value  changed  by  a 
quantity  AviJ  (t',  (or  at  a  rate  dvij/ dt  )  which 
is  opposite  in  sign  from 

A  monopolar  reinforcement  system  is  a  reinforcement 
system  in  which  the  values  of  all  connections  terminating 
on  a  unit  n.-  remain  unchanged  at  time  t  unless  uj(t) 
is  strictly  positive. 

A  bipolar  reinforcement  system  is  a  reinforcement 
system  in  which  the  values  of  connections  are  subject 
to  change  regardless  of  whether  the  output  of  the 
terminal  unit  is  positive  or  negative. 
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DEFINITION  37:  Alpha  system  reinforcement  is  a  reinforcanent  system 

in  which  all  active  connections  jc:  •  which  terminate  on 
some  unit  iij  (i.e.  ,  connections  for  which  (t-T)  O) 

are  changed  by  an  equal  quantity  A^r-  ■  ( t)  =  Y?  or 
at  a  constant  rate  while  reinforcement  is  applied,  and 
inactive  connections  (u'^j  (t-T)  =  O)  are  unchanged  at 
time  t  .  A  perceptron  in  which  -system  reinforce¬ 
ment  is  employed  will  be  called  an  oc.  -perceptron.  The 
reinforcement  will  be  called  quantized  if  the  change  is  a 
fi^ed  quantity  /  \A  —  \r^\)or  non-quantized  if  the  value  may 
change  by  an  arbitrary  magnitude. 


DEFINITION  38:  Gamma  system  reinforcement  is  a  rule  for  changing  the 

values  of  the  input  connections  to  some  unit,  whereby  all 
active  connections  are  first  changed  by  an  equal  quantity, 
and  the  total  quantity  added  to  the  values  of  the  active 
connections  is  then  subtracted  from  the  entire  set  of 
input  connections,  being  divided  equally  among  them. 
Such  a  system  is  said  to  be  conservative  in  the  values, 
since  the  total  of  all  values  can  neither  increase  nor 
decrease.  The  change  in  zr^-j  is  equal  to 


>? 


where  G)-j(t)  /  if  i/*(t-V)  +  0,  0  otherwise; 

N-  -  number  of  connections  terminating  on  u; 

^  =  reinforcement  quantity  (typically  +  1  or  0). 
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Additional  reinforcement  rules,  and  variations  of  the  above, 
will  be  presented  as  required.  The  above  terminology  has  been  standardized 
in  previous  work  on  perceptrons,  and  represents  the  systems  on  which  most 
analysis  has  been  done.  In  most  of  the  cases  to  be  considered,  the  reinforce¬ 
ment  control  system  employs  one  of  three  training  procedures,  defined  as 
follows : 

DEFINITION  39:  A  response-controlled  reinforcement  system  (  R  -controlled 

system)  is  a  training  procedure  in  which  the  magnitude  of 
^  is  constant,  and  the  sign  of  Y?  is  entirely  deter¬ 
mined  by  the  current  response,  f"*  ,  regardless  of  the 
current  stimulus,  5  .  In  general,  unless  otherwise 
specified,  this  term  implies  that  the  reinforcement  is 
always  positive  (i.e.,  the  sign  of  agrees  with  the 

jf. 

sign  of  f'  ,  in  a  simple  perceptron). 

DEFINITION  40;  A  stimulus -controlled  reinforcement  system  (  S  -controlled 

system)  is  a  training  procedure  in  which  the  magnitude  of 
Y'  is  constant,  and  the  sign  of  is  determined 

entirely  by  the  current  stimulus,  S  ,  and  a  pre¬ 
determined  classification,  C(W)  ;  the  current  response 
of  the  perceptron  does  not  influence  either  the  sign  or 
magnitude  of  Y^ 

DEFINITION  41:  An  error -corrective  reinforcement  system  (error 

correction  system)  is  a  training  procedure  in  which 
the  magnitude  ol  YJ  is  C*  unless  the  current  response 


-91- 


of  the  perceptron  is  wrong,  in  which  case,  the  sign  of 
is  determined  by  the  sign  of  the  error.  In  this 
system,  reinforcement  is  0  for  a  correct  response, 
and  negative  (see  Definition  34)  for  an  incorrect  response, 
or,  more  generally,  Yl  =  f(R*-r*)  where  is  the 

required  response,  /**"  is  the  obtained  response,  and  / 
is  a  sign-preserving  monotonic  function,  such  that 

i(0)  -  0  . 

In  previous  reports  (Refs,  41,  82  )  the  R  -controlled  system 
has  been  referred  to  as  a  "spontaneous  learning  system",  since  the 
perceptron  evolves  in  an  autonomous  fashion,  uninfluenced  by  the  "correct¬ 
ness"  of  its  outputs.  The  reinforcement  control  system  requires  no 
information  from  the  environment  in  order  to  control  the  changes  in  the 
memory  state  of  the  perceptron.  The  S  -  controlled  system  has  also  been 
referred  to  as  a  "forced  learning  system",  since  the  r .  c .  s  .  imposes  a 
predetermined  classification  on  the  perceptron's  responses,  without  taking 
the  actual  responses  of  the  system  into  account  at  any  time. 

4,7  Experimental  Systems 

DEFINITION  42:  An  experimental  system  is  a  system  consisting  of  a 

perceptron,  a  stimulus  world,  W  ,  and  a  reinforce¬ 
ment  control  system.  The  reinforcement  control 
system  maybe  an  automatic  regulating  device  (e.g., 
a  thermostat)  or  a  human  operator,  capable  of  respond¬ 
ing  to  the  responses  of  the  perceptron  and  the  stimuli  in 
the  environment  by  applying  the  appropriate  reinforcement 
rules,  altering  the  memory  state  of  the  perceptron. 
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Figure  3  EXPERIMENTAL  SYSTEM  WITH  A  SIMPLE  PERCEPTRON 


Figure  GENERAL  EXPERIMENTAL  SYSTEM 
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The  basic  organization  of  an  experimental  system  with  a  simple 
perceptron  is  shown  in  Figure  3.  A  more  general  system,  in  which  the 
perceptron  may  be  of  any  variety,  and  where  the  output  of  the  perceptron 
is  capable  of  modifying  its  stimulus  environment,  is  illustrated  in  Figure  4. 

A  comparison  with  Figure  1  should  indicate  the  basic  similarity  between  the 
perceptron,  in  a  general  experimental  system,  and  the  biological  nervous 
system.  Analyses  of  perceptron  performance  always  postulate  an  experi¬ 
mental  system,  involving,  as  a  minimum,  the  components  shown  in  Figure  3. 
The  reinforcement  control  system  can  be  considered  a  specialized  part  of 
the  environment,  in  its  relation  to  the  perceptron,  although  it  might  actually 
be  built  into  the  same  physical  mechanism  as  the  perceptron  itself.  In  an 
R-  controlled  system,  the  information  channel  shown  from  14^  to  the  r.c.s. 
is  non-functional,  while  in  an  S  -controlled  system  the  information  channel 
from  IV  to  the  r.c.s.  is  non -functional ,  and  in  an  error-correction  system, 
both  channels  are  essential  for  reinforcement  control.  In  digital  simulation 
programs,  the  r.c.s.  is  the  part  of  the  program  concerned  with  reinforcing 
the  simulated  perceptron,  while  in  experiments  with  hardware  systems  it  is 
generally  a  human  operator. 

An  experiment  involves  an  experimental  system,  a  training 
procedure,  and  a  procedure  for  testing  the  perceptron,  or  measuring  its 
performance.  A  number  of  typical  psychological  experiments,  which  are 
of  interest  for  perceptrons,  were  outlined  in  Chapter  3,.  and  some  of 
these  will  be  analyzed  in  the  following  chapters. 


PART  n 


THREE-LAYER  SERIES-COUPLED  PERCEPTRONS 


5. 


THE  EXISTENCE  AND  ATTAINABILITY  OF  SOLUTIONS  IN 


ELEMENTARY  PERCEPTRONS 


The  perceptrons  to  be  considered  in  Part  II  all  consist  of 
three  layers  of  units  connected  in  series,  with  the  topology  S-^  A— ►R. 

In  the  following  chapters,  it  will  be  seen  that  these  perceptrons  are 
capable  of  learning  any  set  of  responses  which  we  might  care  to  have  them 
make  to  a  universe  of  stimuli.  Their  main  deficiencies  are  a  lack  of  ability 
to  generalize  their  performance  to  new  stimuli  or  new  situations  where  they 
have  not  been  explicitly  taught  and  a  lack  of  ability  to  analyze  complex 
environmental  situations  into  simpler  parts. 

The  first  perceptron  model  to  be  considered  in  detail  is  the 
elementary  -"v  -perceptron.  In  this  chapter,  we  shall  examine  the  intrinsic 
ability  of  such  systems  to  realize  solutions  to  classification  problems, 
including  several  theorems  concerning  the  relationship  of  the  size  of  the. 
system  to  the  existence  of  solutions,  and  the  possibility  of  attaining  such 
solutions  by  different  training  procedures.  The  term  "solution"  is  used  in 
the  sense  of  Def.  30,  in  Chapter  4.  Most  of  these  results  were  first  presented 
in  Ref.  86 . 

5.1  Description  of  Elementary  (X -Perceptrons 


Elementary  rx -perceptrons  were  defined  in  Chapter  4,  as  a 
subclass  of  simple  perceptrons,  in  which  S-units  send  connections  to 
A-units,  and  the  A-units  all  send  connections  to  a  single  R-unit,  no 
other  connections  being  permitted,  and  all  connections  having  equal  trans- 


-97- 


mission  times,  V  .  Without  loss  of  generality,  T  can  be  taken  to  be 
zero,  and  this  assumption  of  instantaneous  transmission  will  be  made 
whenever  we  deal  with  simple  perceptrons,  unless  otherwise  stated.  The 
A-units  and  R  -unit  in  all  elementary  perceptrons  are  of  the  simple  type, 
i.e.  ,  they  have  a  threshold,  0  ,  (equal  to  zero  in  the  case  of  the  R  -unit) 

and  emit  a  signal  only  if  the  input  signal,  OC  ,  is  equal  or  greater  than  0 
The  connections  from  S  to  A  -units  have  fixed  values,  and  the  cormections 
from  the  A-units  to  the  R  -unit  have  variable  values,  which  depend  on  the 
history  of  reinforcements  applied  to  the  perceptron.  The  connections,  in  an 
elementary  perceptron,  all  have  the  transfer  function  (assuming  T  to  be 
zero). 


In  the  -system,  which  is  to  be  considered  initially,  the  reinforcement 
rule  takes  the  form 


f  it  ocjtj  >  e 

0  otherwise 


In  an  elementary  perceptron,  where  the  only  variable  connections  occur 
from  A  -units  to  the  R  -unit,  the  simplified  notation  ?/,•  will  generally 
be  taken  to  mean  the  value  of  the  connection  from  unit  c2^  to  the  R  -unit. 
The  basic  parameters  with  which  we  shall  be  concerned  in  this  chapter  are 
the  number  of  S  -units,  ,  and  the  number  of  A  -units, 

Without  loss  of  generality,  we  can  assume  the  sensory  units  to  be 

situated  at  points  in  a  two-dimensional  field,  or  "retina",  and  regard  the 
input  stimuli  as  patterns  of  illumination  on  the  retina.  A  typical  system 
of  this  type  is  illustrated  in  Figure  5. 
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Figure  5  NETWORK  ORGANIZATION  OF  A  TYPICAL  ELEMENTARY  PERCEPTRON 


5  .  Z  The  Existence  of  Universal  Perceptrons 


Most  of  the  theoretical  results  obtained  to  date  for  elementary 
perceptrons  are  concerned  with  experiments  in  which  a  classification  of  an 
environment,  ,  is  taught  to  the  perceptron  by  some  training  proce- 


dure.  The  first  theorems  to  be  considered  deal  with  the  question  of  whether 
a  solution  to  such  a  classification  problem  exists,  or  might  exist,  for  a 
given  perceptron.  To  begin  with,  the  following  theorem  shows  that  the 
organization  of  an  elementary  perceptron  is  sufficient  to  permit  the 
construction  of  a  "universal  system",  for  which  a  solution  exists  for  every 
possible  classification,  C(v^)  .  Perceptrons  constructed  in  this  manner 
are  generally  not  very  interesting  as  brain  models,  but  the  theorem  indicates 
the  wide  range  of  possible  behavior  which  might  be  obtained  from  such 
systems . 

THEOREM  1:  Given  a  retina  with  two-state  (on  or  off)  input  signals, 

the  class  of  elementary  perceptrons  for  which  a 
solution  exists  to  every  classification,  C(W)  ,  of 
possible  environments  W  ,  is  non-empty, 

PROOF:  Since  it  is  sufficient  to  show  the  existence  of  such  a  perceptron, 

we  proceed  by  construction.  Let  there  be  one  A  -unit  for  every  possible 
stimulus  configuration  on  the  retina.  Consider  stimulus  5/  and  its 
corresponding  A  -unit,  a-  .  Let  ai  have  an  excitatory  connection 
(value  equal  to  +  1  )  originating  from  every  "on"  point  in  5/  ,  and  an 

inhibitory  connection  from  every  "off"  point  in  S’’  ,  and  let  its  threshold 

be  equal  to  the  number  of  excitatory  connections.  Then  there  will  be  one 
and  only  one  A  -unit  responding  to  every  possible  stimulus,  and  no 
A-unit  responds  to  more  than  one  stimulus.  (We  say  that  "responds" 
to  S'  if  c/ ■  <  0  .)  Now  consider  any,  stimulus  world,  11/  ,  defined  on 
the  retina,  and  a  corresponding  classification,  C(WJ  ,  which  associates 
a  positive  or  negative  classification  with  each  stimulus,  5,’  ,  in  l/V  . 


In  order  to  realize  the  classification,  it  is  only  necessary  to  set  the 
value  of  the  connection  from  equal  to  +  1  if  the  class  of  5/  is  positive, 

or  -  1  if  the  class  of  5/  is  negative.  Q.E.D. 

While  this  solution  is  clearly  uneconomical  and  of  little  practical 
interest,  it  is  sufficient  to  show  that  there  are  no  "special  cases"  of 
classifications  which  have  no  solution,  at  least  for  a  retina  of  binary  elements. 

If  the  inputs  to  the  S-units  are  capable  of  taking  on  more  than  two  values, 

then  a  more  elaborate  construction  (e.g.  ,  one  which  separates  each  combination 

of  input  values  to  a  different  set  of  A-units)  would  be  required.  It  is  left  to 

the  reader  to  satisfy  himself  that  a  system  with  less  "depth"  than  an  elementary 

perceptron  (i.e.  ,  one-  in  which  S-units  are  connected  directly  to  the  R-unit, 

with  no  intervening  A-units)  is  incapable  of  representing  a  solution  to  every 

C(WJ  ,  no  matter  how  the  values  of  the  connections  are  distributed, 

5.3  TheG-matrix  of  an  Elementary  r>L  -Perceptron 


In  practice,  the  cases  of  interest  are  those  in  which  each 
stimulus  activates  some  set  of  A-units,  and  each  A-unit  is  likely  to 
respond  to  a  great  many  different  stimuli  in  W  .  In  order  to  deal  with 
such  systems,  the  concept  of  a  G-matrix  has  been  found  to  be  particularly 
helpful,  and  this  will  now  be  defined.  The  definition  given  here  is  suffi¬ 
cient  for  elementary  percsptrons,  and  will  be  generalized  in  a  later 
chapter  to  permit  us  to  deal  with  more  complex  systems. 
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DEFINITION:  Consider  a  (simple)  perceptron,  and  a  stimulus  world,  W  , 

consisting  of  n  stimuli.  Then  the  matrix 

!  9ff  9/2  *  *  •  9/n 
j  9zi  9^2  •  *  •  92r, 

^  "  I  . 

\  . 

\  9ni  9n2‘  “  9 on 

consists  of  elements  •  called  generalization  coefficients.  Each 

element,  g.j  ,  is  equal  to  the  total  change  in  value  (  )  over 

all  A -units  in  the  set  responding  to  S;  if  the  set  of  units  responding  to 

5  •  are  each  reinforced  with  r/  equal  to  / /n^  (where  fsi ^  is  equal  to 
J 

the  number  of  A-units  in  the  system).  For  simple  perceptrons  and  a 
given  environment,  G  is  fixed  for  all  time. 


If  we  are  dealing  with  a  particular  (y.-  -perceptron,  where 
I  we  have 


where  =  the  proportion  of  A-units  which  respond  both  to  S[ 

and  5;  . 


If  we  are  dealing  with  a  randomly  selected  member  of  a  class  of  perceptrons, 
''i  ^  ^  is  a  random  variable,  and  we  have  the  equation  for  the  expected 

value  of  n-  ■  , 

J  r  J 


where  Q-  ■ 


the  probability  that  an  A -unit  in  a  given  class  of 
perceptrons  responds  to  both  stimuli,  j/  and  S; 


With  ^  we  have  a  "normalized  G-matrix".  For  some  purposes 

it  is  convenient  to  take  g  ~  !  ,  in  which  case  the  "unormalized  G-matrix" 
is  equal  to  times  the  normalized  matrix  defined  above. 
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For  the  -system,  ^  —  is  simply  a  measure  of  the  inter¬ 
section  of  the  sets  of  A -units  responding  to  S;  and  to  S'  ,  and  is 
equivalent  to  a  "set  intersection  matrix".  G  is  always  symmetric  for 
an  alpha  system.  In  any  elementary  perceptron  (at  a  given  time  t  ) 
the  net  input  signal  to  the  R-unit  from  the  set  of  A-units  responding  to 
stimulus  5;  will  be  called  u-l  and  is  given  by 

I  ^  q  ■,  r  -f-  r; X ..  -h  ...  -h  n  ■  (5.1) 

where  x  ^  =  the  amount  of  reinforcement  applied  to  the  system,  over  all 

>Jc 

appearances  of  S-  prior  to  time  '  .  In  matrix  form,  the  vector  u 
of  signals  n;  from  all  stimuli  1;  in  ■'/  is  given  by 

//  -  (5.2) 

where  ^  is  a  vector  of  elements  ^  •  ,  defined  as  above. 

d 

5.4  Conditions  for  the  Existence  of  Solutions 

In  general,  if  we  are  given  the  rules  of  organization  of  a 
perceptron  and  some  classification,  (-{d'  ,  it  is  by  no  means  easy  to 
say  whether  or  not  a  solution  to  C  vV  >  exists  for  the  perceptron  in  question. 
The  following  theorems  deal  with  the  existence  of  such  solutions  from 
several  different  points  of  view.  We  first  define  the  bias  ratio  of  an  A-unit 
as  follows: 

DEFINITION :  Given  a  classification,  ^AWj  ,  the  bias  ratio  of  an  A-unit, 

>  ,  is  defined  for  any  set  of  stimuli  in  H/  as  A-~  ,  where  - 

number  of  stimuli  in  the  set  which  are  members  of  the  positive  class  C  and 
which  activate  n-  ;  n-~  -  number  of  stimuli  in  the  set  which  are  members 

I  I 

of  the  negative  class  r  ”  and  which  activate  n  ■ 

It  is  assumed  here  that  all  initial  -  ('  . 
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THEOREM  2; 


Given  an  elementary  perceptron  and  a  classification 
C(W)  ,  the  following  conditions  are  necessary 
but  not  sufficient  for  a  solution  to  to  exist: 

i)  Every  stimulus  must  activate  at  least  one  A  -unit; 

ii)  There  should  be  no  subset  of  stimuli  containing  at 
least  one  member  of  each  class,  such  that  in  the 
union  of  the  responding  A  -unit  sets,  every  A -unit 
has  the  same  bias  ratio  (with  respect  to  the  stimuli 
of  the  subset). 

PROOF :  We  first  prove  that  the  conditions  are  necessary.  Condition  i) 

is  obvious.  The  proof  that  condition  ii)  is  necessary  is  as  follows: 

Assume  there  is  a  subset  violating  this  condition.  Let  (/  ■  = 

t/ 

input  signal  to  P  generated  by  stimulus  5  •  .  Then  summing  the  values  of 

ail  such  signals  from  stimuli  of  the  positive  class  in  this  subset,  we  have 
(since  violation  of  ii)  requires  that  constant  for  A  -units 

responding  to  stimuli  in  this  subset) 


5;fC' 


Thus  the  sum  of  the  R  -unit  input  signals  for  stimuli  of  the  positive 
class  must  have  the  same  sign  as  the  sum  of  the  R  -unit  input  signals 
for  stimuli  of  the  second  class.  But  then  one  of  the  sums  must  disagree 
in  sign  with  the  sign  of  the  class,  and  therefore,  one  of  its  components 
(i.e.  ,  one  of  the  n.-  )  must  disagree  in  sign  with  the  class,  indicating 

that  at  least  one  stimulus  must  be  classified  incorrectly. 
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To  show  that  these  conditions  are  not  generally  sufficient, 
consider  the  following  example:  Let  there  be  five  stimuli,  and  four  A  -units. 
The  A  -units  activated  by  each  stimulus  are: 

S/  activates  a/ 

52  activates 

5j  activates  Oj  and  a.^ 

activates  a/ ,  Qp  * 

3^  activates  a,,  Op  ,  and 

Let  the  positive  class  consist  of  5^  ,  vS_^  ,  and  ,  and  the  negative 
class  consist  of  5/^  and  3^  •  Then  the  bias  ratios  for  a/  and  ap  are 
not  the  same  as  for  and  a.^  .  Also,  there  exists  no  subset  with 

stimuli  from  each  class,  with  equal  bias  ratios  for  all  A  -units,  The 
values  of  and  Up  must  be  positive,  and  the  sum  of  the  values  of 
and  Q.^  must  also  be  positive,  to  obtain  the  required  the  required  classifi¬ 
cation  for  the  members  of  the  first  class.  But  then  it  is  clear  that  either 
5,^  or  5j-  must  be  classified  incorrectly,  which  proves  that  conditions  i) 
and  ii)  are  not  sufficient. 

In  the  next  theorem  we  make  use  of  the  symbol  u.  to  denote 
a  signal  vector,  such  that  the  element  agrees  in  sign  with  the 
classficiation  of  S[  in  C(W)  .  Such  a  signal  vector  will  evoke  the 
correct  response  for  each  stimulus  in  W  .  Two  such  vectors  which 
agree  in  the  signs  of  their  elements  are  said  to  be  in  the  same  orthant 
(generalized  quadrant,  in  n  dimensions). 


In  Theorem  9,  a  necessary  and  sufficient  condition,  closely  related 
to  the  above,  will  be  presented. 
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TPffiOREM  3: 


Given  an  elementary  cy  -perceptron,  a  stimulus  world  IV  , 
and  any  classification  C(W)  ;  then  in  order  for  a  solution 
to  C(W)  to  exist,  it  is  necessary  and  sufficient  that  there 
exist  some  vector  IM  in  the  same  orthant  as  C(W)  ,  and 
some  vector  X  such  that  G  X  =  ll  . 


PROOF:  The  proof  would  follow  trivially  from  Equation  (5. 2)  and  the 

definition  of  u  ,  were  it  not  for  the  possibility  that  a  solution  might 
exist  involving  some  unique  assignment  of  values  to  the  A-R  connections, 
which  could  not  be  attained  by  any  reinforcement  vector,  X  ,  defined  as  in 
Equation  (5.  1).  It  will  be  shown,  therefore,  that  if  a  solution  exists,  in  the 
form  of  any  assignment  of  values  to  A-R  connections,  an  equivalent  solution 
must  exist  corresponding  to  the  reinforcement  of  each  stimulus,  5j  ,  by  an  ■ 
amount  .  For  brevity,  throughout  the  following  discussion,  we  will  speak 
of  "the  value  of  an  A  -unit"  in  place  of  "the  value  of  the  connection  from  an 
A  -unit  to  the  R  -unit".  The  following  definitions  and  notation  will  be  used: 


r 


1  if  the  A  -unit  (Z-  responds  to  Sf 


0  otherwise 


A  is  an  A7  by  A/^  matrix,  in  which  the  element  a-  -  =  a*  (S:)  . 
A  solution  to  a  classification  problem  is  said  to  exist  if  there  is  some 
distribution  of  values  over  the  A  -units  which  enables  the  perceptron  to 
perform  the  discrimination;  i.e.,  there  exist  vectors' ?/-  and  u.  such 
that 


Air  =  it 
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Consider  the  matrix  /I/)  '  .  The  l,J  element  of  this  matrix  (pay  Aij  )  is 

L  (^t)  (Sj)  =  Aij 
A 

But  the  (un-normalized)  G  -matrix  for  an  od  -system,  expressed  in 
terms  of  the  above  functions,  has  elements 

9lJ  =  0-^^(5j) 

A 

so  that  the  matrix  G  =  AA  .  Note  that  this  shows  that  G  is  either 
positive  definite  or  positive  semidefinite  . 

We  then  have,  for  any  vector  x  ,  such  that  x.  A  =  0 

1)  z'A  =  0  =>  z'AA' =  z'a  =0 

2)  x'G  =  0  -=>  yjax  =  =  ix'A,  z'a)  ^  0  =>  z'A  =  0 

Hence,  the  rank  of  G  -  rank  of  A  ,  since  any  vector  X  which  is  in 
the  left  null  space  of  6  is  also  in  the  left  null  space  of  A  ;  therefore  the 
left  null  spaces  of  6  and  A  are  identical.  Since  the  rank  plus  the 
dimension  of  the  null  space  is  equal  to  the  dimension  of  the  domain,  G  and 
A  must  be  of  the  same  rank. 

But  the  columns  of  G  are  linear  combinations  of  the  columns  of 
A  ,  hence  the  space  spanned  by  the  columns  of  G  is  identical  with  the 
space  spanned  by  the  columns  of  A 
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Since  Av  is  s.  linear  combination  of  the  columns  of  A  ,  the 
existence  of  'ir  and  u  such  that  ATy  =  u-  implies  the  existence  of  a  vector 
%  such  that  Gx  =  u  .  Thus,  if  a  solution  exists,  there  is  a  solution  to 
the  equation  Gx  -  it-  ,  so  that  the  condition  of  the  theorem  is  necessary. 

But  it  is  also  sufficient,  since  a  by  definition  represents  a  solution 
vector.  Q.E.D. 

COROLLARY  1:  Given  an  elementary  perceptron  and  a  stimulus  world  W  , 

Then  if  <S  is  singular,  some  exists  for  which 

there  is  no  solution  . 

PROOF:  Each  GflVj  requires  a  solution  vector  in  a  different  orthant,  and 

the  set  of  all  C{iV)  ,  for  a  given  iV  ,  requires  solutions  in  every  possible 
orthant.  But  if  G  is  singular,  it  maps  the  entire  space  into  a  hyperpiane, 
and  this  plane  must  fail  to  intersect  certain  orthants.  Consequently,  the 
classifications  C(l^)  which  are  represented  by  vectors  in  these  orthants 
have  no'  solution. 

COROLLARY  Z:  Given  an  elementary  perceptron,  if  the  number  of  stimuli 

in  ly  is  n  >  A/g  ,  there  is  some  for  which  no 

solution  exists . 

PROOF:  From  Theorem  3  and  Corollary  1,  it  is  clear  that  there  will 

be  some  Cf^^)  which  has  no  solution  if  and  only  if  G  is  singular.  G 
has  the  same  rank  as  the  matrix  A  ;  but  A  is  3.n  n  by  A/^  matrix, 
implying  that  A  ,  and  therefore  G  has  rank  <  n  . 
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COROLLARY  3:  For  any  elementary  perceptron,  as  the  number  n  of 

stimuli  in  W  increases,  the  probability  that  a  randomly 
selected  classification,  C(W)  ,  has  a  solution  approaches 
zero  (where  is  chosen  from  a  uniform  distribution 

over  the  possible  classifications  of  IV  ). 


PROOF:  From  Corollary  2,  i  n  increases  beyond  the  number  of  A-units 

in  the  perceptron,  there  must  be  some  C(W)  without  a  solution.  At  the  same 
time,  increasing  n  increases  the  set  of  possible  classifications  in  proportion 
to  2'^  .  But,  owing  to  a  theorem  bv  R.  D.  Joseph  and  Louise  Hay  (Ref.  41, 

Appendix  ),  the  number  n(r)  of  classifications  which  have  solutions  is  no 


greater  than  2 


n  -  / 


.'n  - 1 


oJ^[  / 


4-  ..  .+ 


'n  - 2 

\r- ' 


where  r  ^  is  the  rank  of  the 


G-matrix.  Therefore,  the  upper  bound  of  the  probability  of  selecting  at  random 
one  of  the  classifications  which  has  a  solution  diminishes  with  n(r)^2'^  which 
goes  to  zero  as  n  goes  to  infinity. 


Several  additional  tests  for  the  existence  of  solutions,  which  are 
of  practical  utility  in  diagnosing  small  systems,  will  be  found  in  Theorems  9 
and  10,  at  the  end  of  this  chapter. 

5.5  The  Principal  Convergence  Theorem 


In  the  preceding  section,  the  existence  of  solutions  to  classification 
problems  in  an  elementary  perceptron  was  considered,  but  nothing  has  been 
said  about  the  ability  to  achieve  such  a  solution  by  a  training  procedure.  In 
this  section,  we  consider  the  ability  of  an  elementary  ry  -perceptron  to  learn 
the  solution  to  a  classification  C(W)  under  an  error  correction  procedure. 

The  following  theorem  is  fundamental  to  the  theory  of  perceptrons. 
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A  general  definition  of  an  error  correction  procedure  was  given 
in  Definition  41,  in  Chapter  4.  We  now  define  in  detail  two  specific  forms  of 
this  procedure,  as  they  apply  to  the  elementary  ot  -perceptron. 


Consider  some  classification,  C{  14^)  .  Let 


where  / 


n  . 


+  1 
-  1 


if  stimulus  5^'  is  to  be  in  the  positive  class 
if  stimulus  is  to  be  in  the  negative  class 


In  order  to  obtain  the  most  general  conditions  for  the  following  theorem,  a 
non-quantized  error  correction  procedure  is  defined  as  follows:  No  response 
will  be  considered  correct  unless  the  magnitude  of  the  input  signal  to  the 
R-unit  (u  ,;)  is  greater  than  d'  ,  and  the  sign  of  i/ •  agrees  with  /J- 
for  the  current  stimulus.  (This  corresponds  to  an  R-unit  with  a  threshold 
of  (f  ,  or  for  the  special  case  where  d"  =  0,  it  corresponds  to  a  simple 
R-unit.  )  If  no  error  occurs  for  stimulus  (i.e.  ,  u;  >  rf  )  no 

reinforcement  occurs;  but  if  an  error  does  occur  a  quantity  ff  ^  />•  Ay  i 
is  added  to  the  value  of  each  active  A-unit,  A  y ;  (the  number  of  units  of 
reinforcement)  being  just  sufficient  to  bring  the  magnitude  of  the  signal  ijI  • 
past  the  threshold  level,  cf  ,  to  the  level  6  >  (f  .  In  a  quantized 
correction  procedure,  the  identical  rules  apply,  except  that  =  p- ^  t  I, 
A  rj  representing  a  single  unit  of  reinforcement. 
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THEOREM  4: 


Given  an  elementary  ry  -perceptron,  a  stimulus 
world  lA^  ,  and  any  classification  (T/'l/l//  for  which  a 
solution  exists;  let  all  stimuli  in  lA/  occur  in  any 
sequence,  provided  that  each  stimulus  must  reoccur 
in  finite  time;  then  beginning  from  an  arbitrary  initial 
state,  an  error  correction  procedure  (quantized  or 
non -quantized)will  always  yield  a  solution  to  C(  l^V J  in 
finite  time,  with  all  signals  to  the  R-unit  having  magni¬ 
tudes  at  least  equal  to  an  arbitrary  quantity  h'  ^  0. 

PROOF:  The  matrix  A  is  defined  as  in  Theorem  3,  so  that  o;-  -  Oj'bij  . 

We  recall  that  ^  .  We  also  define  the  matrix  £  such  that 

j  -  ;  ^  ;  ;  the  matrix  b  '  :  and  the  diagonal  matrix  D 

such  that  --  '.-j  J;  .  Note  that  'J  -  I,  £A  -  B ,  and  H  ~  DG  D . 

We  first  consider  the  non -quantized  error  correction  procedure. 

In  this  case,  no  reinforcement  is  applied  unless  an  error  occurs;  if  an  error 
does  occur  (when  ‘  I  ^  '  )  the  quantity  /:>;  L  ^  6' '  is  added 

to  the  value  of  each  active  A -unit,  /  -  ■  being  chosen  so  that  the  input  to 
the  response  unit  is  exactly  Ojf,  { •-  '  •  .  It  will  be  shown  below  that 
such  a  /£.  y  :  exists  . 


The  proof  of  this  theorem  (which  was  first  published  by  Rosenblatt  in 
Ref.  86)  has  undergone  a  number  of  modifications.  The  original  treat¬ 
ment  was  insufficient  to  prove  the  theorem  in  a  rigorous  fashion; 
subsequent  forms  have  been  due  to  Block,  Joseph,  Kesten,  and  others; 
and  the  present  proof  owes  much  to  each  of  these.  An  interesting 
alternative  approach,  with  a  slightly  modified  reinforcement  procedure, 
has  recently  been  proposed  by  Papert  (Ref.  67)  who  attempts  to  shorten 
the  demonstration  and  avoids  use  of  the  G-matrix.  Unfortunately,  there 
are  several  logical  errors  in  Papert's  argument,  the  correction  of  which 
would  tend  to  lengthen  his  demonstration. 
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It  has  been  noted  previously  that  the  space  spanned  by  the  columns 
of  G  is  the  same  as  the  space  spanned  by  the  columns  of  A  (the  rank  of 
G  being  equal  to  the  rank  of  ).  Consequently,  for  any  Nq^  -vector  1/  , 

there  is  an  n  -vector  Z  such  that  AV  =  6Z- 

An  arbitrary  initial  state  for  the  perceptron  is  represented  by  an 

A/(i  -vector  1/ “  of  values  for  the  A-units.  Let  be  a  corresponding 

r  .  th 

n  -vector.  Let  Z  be  the  n  -vector  whose  i  component,  ,  is 

equal  to  the  total  quantity  of  reinforcement  given  in  all  previous  corrections 

for  stimulus  $[  ,  i.e., 

Pi  Ax.  I  (summing  over  all  previous  corrections). 

Let  U  =  GZ°  h  GZ  =  G{7°  +  Z)  =  GD(p°+  Z)  where  /“-Z^Z'^and 

.  th 

/  =  DZ  .  The  L  component  of  d  ,  ,  would  be  the  input  to  the 

R-unit  if  5/  were  to  occur  at  the  present  time.  Let  W  =  DU  .  This 
equation  can  be  written 

kV  -  H(X°+-  X) 

where  a  negative  (or  more  precisely,  juu-i  ^  (f  )  represents  an  error. 

The  X-i  are  always  non-negative,  and  this  will  be  understood  for  the 
remainder  of  the  proof.  We  now  define  M  as  the  maximum  diagonal  element, 
hii  ,  of  A/  .  We  also  define  the  function  of  the  n  -vector  Z 

n 

K(Z)  =  Z'HZ  -  26  22 

i  =  l 
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We  then  obtain  the  following  results: 


1)  The  existence  of  a  solution  means  that  there  is  an  vector  I/*  such 

that  for  all  i 


where  u/--  >  0  .  In  matrix  form 


2)  Consider  X'HX  for  all  X  such  that  |1  / 1|  =  /  (and  of  course  Xi  ^  0  )• 

X'HX  =  ( X  'B)  (X'  ti)'  so  that  X'HX  >  0  •  Suppose  X'HX  -  0  ;  then  X'B  =  0  . 
Clearly  X'W'^>  0  ,  but  X'W'^=  X'BV*-0.  This  contradiction  shows 

that  X' HX  >  0  on  this  closed,  bounded  set,  so  that  there  exists  a  minimum 
c>'.  >  0  such  that  X'HX  >  fV.||X||'^  for  all  /  for  which  X;  ^  0  for  all  i 
Note  that  M  t  cy.  >  O  a.s  a.  consequence.  Note  also  that  =  h--  >  cx.  >  0 . 

3)  L  71  ^  )■  n  llA'il  (Schwarz's  inequality) 

and  \X'HX°\  ^  Ij V/V ‘’i|  •  || X  ||  =  A  ||x||  (Schwarz's  inequality) 


4)  K{X°tX)  -  K(X°)  =  K(X)  +  XX'hX° 

>  cx  \\xf-2€fX\\X\\  -  2 A  Ikll 

cx 


dK(X°-h  Xj 
dx- 


2jiri  -  2e 


d  Jxr  ^ 
d  X  ; 


>  0 


and 


.  This  latter  relation  proves  the  contention  at 


the  beginning  of  the  proof  that  Ax^  >  0  exists.  Specifically,  we  have 


6)  A  correction  is  made  for  S‘  only  if  /.cr-  <  cf  .  Denote  the  change  in  K 
when  this  is  done  by  AK  ,  and  by  subscript  O  the  conditions  before  the 
correction. 


,x.io  +  A  Xi 

AK(X°-hXj  -21  (xxri-e)  dxi  =  2 !  t  :: 


^io 


—  {xxri  -  6)' 


_ ^  ^  ^ 

h: 


'l  i 


(e-df 

M 


7)  From  4)  and  6)  we  conclude  that  the  maximum  number  of  corrections 
is 


^  ^  MjA-tefTrY 
(y.{e  -(f)^ 
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8)  In  particular,  if  X  0  and  d'  =  0  (corresponding  to  a  perceptron  with 
a  simple  R-unit  and  no  initial  reinforcement)  then  =  W  AZ/Y^lj  =  0  and 
the  bound  becomes  n  M / cy.  . 


This  proves  the  theorem  for  the  case  of  the  non-quantized 
correction  procedure,  since  N  is  finite,  implying  that  the  process  arrives 
at  a  solution  in  finite  time.  For  the  quantized  case,  we  have  the  condition 
that  ,"\  K  is  always  1  when  a  correction  occurs  (the  vector  /  representing 
the  numbers  of  unit  corrections  for  each  of  the  n  stimuli).  For  convenience, 
we  take  the  case  where  rf  =  O  and  f  -  M  =  {n',[)fnnx  '  step  6) 

we  have: 


6a,)  .\y  i  ) 


^  ^io  ^  ' 


r‘ 


’  /( -•  -  h' )  r!  !.  ■  ~  X  •‘"''o  X  X  ■  ]  -  M  I-/  >',  ■ 

•  ,  /  [  lO  I  i  ^  I  11,.  j  •  , 


lO 


,  0 


h::  r  V 

=  >  X  ■  -Mr-  V  — i-‘  [  y  .  -  y  .]  \ 

I  .0  <  <•  '  1-0/  I 


^  lo  ^  ' 


- .  0 


-  A-l 


7a)  From  4)  and  6a)  we  have  that  the  maximum  number  of  corrections  is 


•V 


f  *-  M  '  r  ) 

.\  M 


An  alternative  bound,  found  by  H.  Kesten,  is  i, x  h;-.  '  . 

This  under  some  circumstances  represents  a  sharper  bound;  nonetheless, 
both  bounds  are  generally  quite  poor,  as  estimates  of  the  actual  number 
of  steps . 
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8a)  This  upper  bound  is  again  minimized  when  X°  =  0  so  that  4  =  II //X °||  =  <5. 
The  bound  is  then  n  M ^ <y. 

This  completes  the  proof  of  the  theorem  for  the  quantized  case. 

Q.E.D. 

COROLLARY  :  Given  an  elementary  perceptron,  a  stimulus  world  W  , 

and  any  classification  Cfl^J  ;  then  if  a  solution  to  C(W) 
exists,  the  set  of  possible  solutions  to  C(W)  has  positive 
measure  over  the  phase  space  of  the  perceptron. 

PROOF:  From  the  proof  of  the  theorem,  we  know  that  if  a  solution  exists, 

there  is  a  strictly  positive  vector  X  such  that  MX  =  P  (where  P  is  a 
strictly  positive  vector).  Let  Y  be  any  n  -vector;  then  ||//k||  -  b  1|  / 1| 
where  b  is  the  absolute  value  of  the  maximum  eigenvalue  ol  H  ,  or  the 
norm  of  H  .  Let  yu  ^  ''C"’  p’  >  0  ,  and  let  6  =  +  .  Let  U 

be  in  the  6  -sphere  around  X  ,  i.e.,  U  =  P+Y  where  ||F||^  €  .  Let 

Z  =  HP  ,  and  let  %  =  ^  ||Z|!  =  \\hy\\  ^  ^  Then 

A-  ^  ^ 

HU  -  H((/-h  y)  =  p  -h  z 


Therefore,  HU  is  strictly  po  itive,  and  U  is  an  alternative  solution. 

This  means  that  there  is  a  cone  of  vectors  including  X  which  maps 
into  the  region  which  contains  P  ,  any  such  vector  representing  an  equiva¬ 
lent  solution.  Since  the  volume  of  this  cone  has  positive  measure  over  the 
phase  space,  the  corollary  follows . 
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Additional  Convergence  Theorems 


The  theorem  in  the  previous  section  deals  with  convergence  to  a 
solution  state  in  an  oi  -perceptron,  trained  by  the  error  correction  procedur 
In  this  section,  it  will  be  shown,  first,  that  a  weaker  form  of  correction 
procedure  can  also  be  guaranteed  to  yield  a  solution;  secondly,  that 
reinforcement  procedures  in  which  the  magnitude  of  does  not  depend  on 
whether  or  not  the  current  response  is  correct  cannot,  in  general,  be  relied 
on  to  converge  to  a  solution.  If  a  solution  state  does  occur  in  .such  a  system, 
it  will  be  shown  that  it  is  apt  to  be  unstable  except  under  special  conditions. 

DEFINITION:  A  random-sign  correction  procedure  is  one  in  which  some 
quantity  of  reinforcement  is  applied  to  the  perceptron  when  an  error  occurs 
and  zero  reinforcement  is  applied  when  the  response  is  correct.  The  sign 
of  /p  is  chosen  at  random,  with  an  equal  probability  of  being  positive  or 
negative,  regardless  of  the  response  of  the  perceptron. 

THEOREM  5:  Given  an  elementary  ry  -perceptron,  with  a  finite 

number  of  memory  states,  a  random-sequence  stimulus 
world  ,  and  any  classification  C{W)  for  which  a 
solution  can  be  reached  from  the  starting  point  by  some 
reinforcement  sequence,  then  a  solution  will  be  obtained 
in  finite  time  with  probability  1  by  means  of  a  random- 
sign  correction  procedure. 

PROOF:  The  random-sign  correction  procedure  consists  of  a  random 

walk  in  which  each  step  corresponds  either  to  a  step  of  the  required 
correction  process,  or  a  step  in  the  reverse  direction.  In  the  course  of 
this  process,  the  vector  u-  (defined  in  connection  with  Theorem  4)  will 


eventually  reach  some  attainable  trapping  state  with  probability  1 .  But  the 
only  trapping  states  are  in  the  solution  space.  Consequently,  a  solution 
will  be  obtained  in  finite  time . 

In  Chapter  4,  (Definition  40)  an  S-controlled  reinforcement 
system  was  defined  as  a  training  procedure  in  which  the  magnitude  of  is 
constant,  regardless  of  the  current  response  of  the  system,  the  sign  of  ^ 
being  chosen  to  agree  with  the  sign  of  the  classification  of  the  current  stimulus, 
S-  ,  in  C(IA^J  .  Unlike  the  methods  considered  previously  in  this  chapter, 
this  is  not  a  correction  procedure;  i.e.,  the  magnitude  of  reinforcement  does 
not  depend  on  the  occurrence  of  an  error,  and  only  the  sign  of  the  required 
response  is  taken  into  consideration  in  determining  what  reinforcement 
should  be  applied.  In  the  following  analysis,  a  solution  will  be  called  stable 
if,  in  a  given  experimental  system,  all  future  memory  states  will  also 
satisfy  the  conditions  of  a  solution,  no  matter  how  long  the  experiment 
continues.  A  system  employing  a  correction  procedure,  since  it  receives 
no  further  reinforcement  once  a  solution  state  is  achieved,  is  inherently 
stable.  The  following  theorem  shows  thatthisis  not  the  case  for  an 
S  -controlled  system. 

THEOREM  6:  Given  an  elementary  I'y  -perceptron,  a  stimulus  world  14/  , 

—  and  some  classification  (  (W)  for  which  a  solution  exists, 
a  solution  can  sometimes  be  achieved  by  an  S  -controlled 
reinforcement  procedure.  However,  such  a  solution  cannot 
be  guaranteed  for  an  arbitrary  stimulus  sequentej  and  may  be 
unstable  if  it  occurs. 

PROOF :  We  will  first  consider  a  case  in  which  a  stable  solution  does  occur, 

for  the  type  of  experimental  system  specified  by  the  theorem.  Let  W  consist 
of  two  stimuli,  j ^  and  .  Let  activate  some  set  of  A-units,  A  ^  , 
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and  let  5^  activate  a  disjoint  set  of  A-units ,  A  2  •  Let  C(W)  assign  5^ 
to  the  positive  class  and  5^  to  the  negative  class.  Regardless  of  the 
sequence  and  relative  frequency  of  fy  and  >  it  is  clear  that  each 
occurrence  of  5)  will  augment  u f  in  a  positive  direction,  while  each 
occurrence  of  5^  will  make  u p  increasingly  negative .  Since  the  intersection 
A  ^2  is  assumed  to  have  zero  measure,  there  will  be  no  interference  between 
the  two  stimuli,  so  that  the  acquired  solution  will  remain  stable  no  matter  how 
long  the  process  continues.  This  example  proves  the  first  part  of  the  theorem. 
Let  us  now  consider  the  case  of  intersecting  A-unit  sets.  Suppose  5/  activates 
two  units,  and  ,  while  5^  activates  units  o. p  and  cl^  (the  unit 
responding  to  both  stimuli).  If  the  frequencies  of  5^  and  Sp  are  equal,  their 
effect  on  will  tend  to  cancel,  and  a  solution  with  'ir^  positive,  ny-p  negative, 
and  equal  to  zero  will  tend  to  occur.  As  the  sequence  continues,  the  magni¬ 
tudes  of  and  7yp  will  tend  to  increase  without  bound,  so  that  the  solution 
will  become  increasingly  stable  as  time  goes  on.  Suppose,  on  the  other  hand, 
that  occurs  with  ten  times  the  frequency  of  S p  ■  case,  a^.  will 

gain  ten  units  of  positive  value  for  every  unit  of  negative  value  received  from 
Sp  ,  so  that  7/^  will  tend  to  increase  in  a  positive  direction  at  nine  times 
the  rate  that  7/^  progresses  in  a  negative  direction.  Thus  the  net  signal,  Up  , 
transmitted  to  the  R-unit  in  response  to  5,  >  which  is  equal  to  7/^  -h  7-^  , 

will  clearly  become  strongly  positive  as  time  goes  on,  resulting  in  an 
erroneous  classification  of  S'p  .  Even  if  the  initial  state  of  the  perceptron 
was  a  solution  state  (e .  g .  ,  -/,  7.a  =  O  )  it  is  clear  that 

the  S-controlled  procedure  will  quickly  destroy  the  existing  solution,  which 
is  therefore  unstable.  Q.E.D. 

?lc 

H.  D.  Block  has  pointed  out  that,  while  a  solution  to  C(WJ  can  not  be  guaran¬ 
teed  with  a  random  stimulus  sequence,  nonetheless  if  a  solution  exists  then 
there  exists  some  S-sequence  which  will  guarantee  a  solution  with  S-controlled 
reinforcement.  In  particular,  if  -  u.  is  a  solution,  then  the  occurrence  of 

S-  with  frequency  f-  -  \z\  (for  all  i  )  will  guarantee  a  solution. 


-119- 


In  the  example  considered  above,  it  is  clear  that  a  frequency  bias , 

in  which  the  stimuli  of  one  class  are  much  more  frequent  than  members  of  the 

other  class,  can  strongly  prejudice  the  perceptron  to  always  give  the  response 

associated  with  the  more  frequent  class,  in  an  S-controlled  system.  Such  a 

problem  would  exist,  for  example,  in  trying  to  teach  a  perceptron  to  distinguish 

the  letters  "E"  and  "X"  occuring  with  their  normal  frequency  in  English  text. 

Even  if  all  stimuli  occur  with  equal  frequency,  however,  a  similar  effect 

exists  if  there  is  a  size  bias,  in  which  the  stimuli  in  one  class  activate 

more  S-points  (or  illuminate  a  larger  area  of  the  retina)  than  the  other  class. 

As  will  be  seen  in  the  following  chapter,  larger  stimuli  generally  tend  to 

activate  more  A-units  than  smaller  stimuli,  and  in  the  limiting  case,  the  set 

of  A-units  responding  to  a  smaller  stimulus  may  be  entirely  contained  within 

the  set  responding  to  a  larger  stimulus.  Suppose  fur  example,  that  5^ 

activates  units  o ^  and  o  ,  while  5-  only  activates  .  A  solution  which 

classifies  5/  positively  and  S  ,  negatively  clearly  exists  (e.g.,  let  7--  -  y- 5" 

and  7^^  -  -  !  )  but  if  the  stimuli  occur  alternately,  a  ^  will  tend  to  become 

increasingly  positive,  while  // ,  tends  to  oscillate  about  zero.  The  reader 

>. 

can  satisfy  himself  that  (starting  with  0  values)  a  quantized  error  correction 
procedure  yields  a  stable  solution  to  this  problem  after  five  stimuli. 

In  the  case  of  R-controlled  reinforcement  procedures  (Definition  39 
in  Chapter  4)  it  makes  no  sense  to  talk  about  the  probability  of  convergence  to 
solution  for  an  arbitrary  classification,  ^'{Wj  ,  since  the  required  classi¬ 
fication  plays  no  part  whatever  in  determining  either  the  sign  or  the 
magnitude  of  the  reinforcement.  As  will  be  shown  later,  it  may  happen 
that  an  R-controlled  reinforcement  system  leads  to  the  acquisition  of  an 
interesting  stable  response  function  by  a  perceptron,  but  this  cannot 
generally  be  guaranteed,  and  any  classification  which  is  achieved  is  necessa- 
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rily  one  which  is  selected  by  the  perceptron,  rather  than  by  the  experi¬ 
menter.  The  interesting  questions  concerning  such  systems  deal  with  the 
types  of  classifications  to  which  they  converge,  for  different  kinds  of 
environments.  In  particular,  we  will  be  interested  in  any  systems  which 
tend  to  form  classifications  on  the  basis  of  some  concept  of  stimulus 
''similarity”.  It  will  be  shown  in  later  chapters  that  elementary  perceptrons 
do  not,  in  general,  tend  to  form  classes  on  this  basis  except  under  special, 
and  highly  restrictive,  environmental  conditions,  but  that  cross -coupled 
perceptrons  appear  to  have  a  striking  capability  for  such  "spontaneous 
organization" . 

In  the  preceding  theorems,  only  perceptrons  employing  alpha 
system  reinforcement  have  been  considered.  The  remaining  two  theorems 
consider  two  departures  from  this  model.  The  first  demonstrates  that  an 
even  weaker  form  of  reinforcement  than  that  in  the  random-sign  correction 
procedure  can  guarantee  a  solution  in  finite  time,  provided  it  is  employed  in 
a  correction  procedure,  in  which  the  application  of  reinforcement  depends 
upon  the  occurrence  of  response  errors.  We  define  a  random  perturbation 
correction  procedure  as  a  reinforcement  process  in  which,  if  an  error  occurs, 
reinforcement  is  applied  to  the  active  A-units,  as  in  the  ry,  -system,  except 
that  the  magnitude  and  sign  of  are  both  chosen  independently  and 
separately  for  each  reinforced  connection  in  the  system,  according  to  some 
probability  distribution. 

THEOREM  7:  Given  an  elementary  perceptron  with  a  finite  number 

of  memory  states,  a  stimulus  world  W,  and  a  classi¬ 
fication  C  ^w)  for  which  a  solution  can  be  reached 
from  the  starting  point  by  some  reinforcement  sequence, 
then  a  solution  can  always  be  obtained  in  finite  time  by 
means  of  a  random  perturbation  correction  procedure. 
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PROOF; 


The  reinforcement  process  is  a  random  walk,  which  (for  the 
given  conditions)  will  eventually  take  the  representative  point  of  the  system 
to  every  attainable  point  in  phase  space.  Since  the  number  of  points  is  assumed 
to  be  finite,  a  solution  must  be  reached  in  finite  time. 

Of  the  three  reinforcement  procedures  which  have  been  shown 
to  guarantee  solutions  in  elementary  perceptrons  (error  correction,  random- 
sign  correction,  and  random  perturbation  correction  procedures)  the  first 
is  clearly  the  strongest,  and  can  be  expected  to  converge  most  rapidly.  The 
random  perturbation  procedure  will  converge  most  slowly,  since  it  must 
hunt  through  a  large  domain  of  the  phase  space  of  the  system  before  achieving 
a  satisfactory  terminal  state,  and  is  not  guided  during  this  process  by  any 
directional  constraints.  In  this  respect,  it  shares  many  of  the  difficulties 
of  Ashby's  homeostat  (Ref.  3);  but  it  shares  the  virtue  of  the  homeostat  as 
well,  that  if  the  solution  space  is  attainable,  it  will  utlimately  arrive  at  a 
solution  no  matter  how  complicated  its  functional  representation  may  be. 

The  random  sign  and  random  disturbance  procedures  may  prove  to  be  of 
interest  in  biological  models,  since  the  only  information  required  for  the 
control  of  reinforcement  is  whether  or  not  an  error  has  occurred. 

In  practice,  it  will  be  seen  that  a  gamma  system  (Definition  38, 
Chapter  4)  generally  works  at  least  as  well  and  sometimes  better  than  an 
alpha  system.  Nonetheless,  the  following  theorem  indicates  that  this 
system  lacks  the  true  universality  of  the  alpha  system. 
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THEOREM  8: 


Given  an  elementary  f  -perceptron,  a  stimulus 
world  1/1/  ,  and  a  classification  ,  it  is  possible 

that  a  solution  to  C(W)  exists  which  cannot  be 
achieved  by  the  perceptron. 

PROOF:  Let  each  A-unit  be  activated  for  at  least  one  stimulus  in  14/  , 

and  let  each  stimulus  activate  a  disjoint  set  of  A-units.  Let  the  classification 
function  Ck/'j  be  one  which  assigns  every  stimulus  to  the  same  class,  either 
positive  or  negative.  A  solution  clearly  exists,  if  the  values  of  all  connections 
are  positive  (or  negative,  as  required  by  the  classification).  But  if  the  initial 
state  of  the  system  is  one  in  which  all  values  are  zero,  or  of  the  wrong  sign,  a 
solution  can  never  be  achieved  by  the  gamma  system,  since  a  solution  requires 
that  the  total  value  of  each  set  -  of  units  responding  to  j ,  and 
consequently  the  total  value  over  the  entire  A  -set,  should  agree  in  sign 
with  the  classification.  In  the  gamma  system  this  is  impossible,  since  the 
initial  sum  of  the  values  is  constant.  The  conservative  property  of  the  gamma 
system  gives  it  one  degree  of  freedom  less  tlian  the  alpha  system,  making  it 
impossible  to  achieve  a  solution  to  such  problems  unless  at  least  one  surplus 
A-unit  (which  does  not  respond  to  any  stimuli)  exists. 

The  two  remaining  theorems  were  proposed  by  Joseph  (Ref.  42), 
and  establish  useful  diagnostic  procedures  for  determining  the  existence  of 
solutions  in  both  alpha  and  gamma  system  perceptrons.  As  in  Theorem  3, 
the  activity  function  of  the  A  -unit  o  •  is  defined  as 

r  I  if  •  is  active  for  5  • 

i 

I  0  otherwise 
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For  any  n  -vector,  /  ,  with  components  ,  the  bias  number  of 

with  respect  to  /  is  defined  as 

n 

This  quantity  is  clearly  related  to  the  bias  ratio  (defined  in  5.4)  if  /  is 

taken  to  be  the  class-assignment  vector  for  the  n  stimuli.  We  will  denote 

by  any  n  -vector  X  whose  components  x-  do  not  disagree  in  sign  with 

the  required  classification,  C(^^/)  ,i.e.,  x  -  ^0  if  5  •  is  in  the  positive 

>J  u 

class,  and  X;  ^  ()  if  5  •  is  in  the  negative  class.  /  will  denote  a 
vector  in  which  the  inequalities  are  strict  (no  zero  components). 


THEOREM  9:  Given  an  <Y.  -perceptron,  and  a  classification  C(W)  ,  a 

necessary  and  sufficient  condition  that  the  error  correction 
procedure  reach  a  solution  (in  finite  time,  with  arbitrary 
starting  point)  is  that  there  exists  no  non-zero  X  such 
that  b;  /  *^  0  for  all  i  . 


PROOF :  For  conveneince,  an  un -normalized  G-matrix  will  be  assumed. 

For  such  a  matrix, 

9j^  =  (b-)  o" (3^) 

I 

where  is  the  number  of  A -units  in  the  set  responding  to  both  Sj  and 

Hence,  for  any  n  -vector  A'  , 


Z  'J  9.;4  Z 

J  ’  ^  ’■>j  >  ^ 


But 


I  I  J 


‘V-'S 
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Hence 


-  J^[bi(X}] 

L 

If  the  condition  of  the  theorem  holds,  then  X^OX  ^  0  for 
X  X  X  =f--  C  .  But  from  the  proof  of  Theorem  4,  it  can  be  shown 

that  /  'GX  ?  o  ||x||  for  X  --  X  ^  ,  where  a  >  0  .  Then  the  proof  of  the 

correction  procedure  in  Theorem  4  applies,  and  a  solution  exists,  so  that 
the  stated  condition  must  be  sufficient. 

If  the  condition  does  not  hold,  then  there  is  a  non-zero 
such  that  x'  -  O  .  Since  G  is  positive  semidefinite ,  this  implies  that 
X'  G  -  O  '  .  Thus,  /  is  orthogonal  to  all  the  columns  of  G  ,  and  hence 
to  any  linear  combination  of  the  columns  of  G  .  Since  for  an  arbitrary 
vector  Z  ,  GZ  is  a  linear  combination  of  the  columns  of  G  ,  GZ  is 
orthogonal  to  Z  .  /  cannot  be  orthogonal  to  any  vector  U  in  which 

the  signs  of  all  '/  ■  agree  with  C[W)  ,  and  hence  it  follows  that  there  cannot 
exist  vectors  Z  and  /./  such  that  GZ  U  .  This  mean  s  that  there 
exists  no  solution  to  the  classification  problem,  so  the  condition  given  must 
be  necessary.  Q.E.D. 

COROLLARY:  For  an  -system,  the  condition  that  there  exist  no 

non -zero  vector  X  such  that  b^X^  0  for  all  i 
is  equivalent  to  the  condition  that  there  exist  Z  and 
(J  such  that  -Z  U  (where  'J  is  in  the  same  orthant 
as  (vyj).  Alternatively,  this  condition  is  equivalent 
to  X'GX  ’  O  for  all  non-zero  X^  . 
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THEOREM  10: 


Given  a  /• -perceptron,  and  a  classification  C(W)  ,  a 
necessary  and  sufficient  condition  that  the  error  correction 
procedure  reach  a  solution  (in  finite  time)  is  that  there 
exists  no  non-zero  X  such  that  bi  X  =  /: 
for  all  L  . 


PROOF:  For  the  -system,  the  normalized  (7  matrix  consists  of 


elements 


■■■ 


»  D  ■  nr 


•  CL  '  t 

L  !  ,n 


It  is  readily  seen  that  6  is  symmetric.  For  any  n  -vector  X  ,  X  GX 
is  given  by 


A  . 

We  now  define  h  IXj  as 

b  a; 

I 

From  this,  we  see  that 

[A;{r’- f,'u/Y 


X)\  -  A/. 


V.  L^-(A 


Z,  ('■>.;)  X (-■/)- >aX(X  Xi,(S/i) 


^  J  ^  A  t  ^  -'.J  '  '  I  »r  A  / 
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Hence 


From  this  it  follows,  first  of  all,  that  G  is  positive  definite  or  positive 
semidefinite ,  as  was  the  case  for  the  -system.  Secondly,  it  is  seen 
that  X'GX  -■  O  if  and  only  if  b^iX)  --  C  for  all  l  .  The  proof  now 
proceeds  exactly  as  in  Theorem  9. 

COROLLARY;  For  a  /'-system,  the  condition  that  there  exists  no 

non -zero  vector  such  that  bi  =  .C  for 

all  I  is  equivalent  to  the  condition  that  there  exist  Z 
and  LJ  such  that  GZ  -  U  (where  //  is  in  the  same 
orthant  as  '  tV)  ) , 


In  practice,  it  is  often  possible  to  show  that  a  given  perceptron 

does  not  permit  a  solution  to  a  given  classification  problem  by  substituting 

the  classification  vector  itself,  f  (  W)  ,  for  the  vector  X  in  the  above 

theorems,  and  computing  the  b-  .  If  these  turn  out  to  be  zero  for  all 

A-units,  then  no  solution  exists  for  either  the  alpha  or  gamma  system.  If 

they  are  a  constant  other  than  zero,  a  solution,  may  exist  for  the  alpha 

system,  but  not  for  the  gamma  system.  If  they  are  not  all  identical,  then 

a  solution  may  exist  for  either  system.  While  it  is  sufficient  to  take  the 

components  of  X^  to  be  integers,  the  vector  with  all  components  X  ^  =  t  / 

is  not  always  sufficient.  For  example,  if  the  matrix  is  1  1  1 

,  1  1  1 

1  1  1 


the  6-  will  all  be  anihilated  by  X  --  (1,  -Z,  1),  but  not  by  X  =  (/,-/.!). 
The  condition  for  the  c/ -system  is  equivalent  to  the  requirement  that  there 
should  be  no  vector  in  the  same  orthant  as  C{W)  which  is  orthogonal  to  the 
linear  manifold  spanned  by  the  activity  vectors  of  the  A-units. 
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6.  Q-FUNCTIONS  AND  BIAS  RATIOS  TN  ELEMENTARY  PERCEPTRONS 


Thus  far,  we  have  been  mainly  concerned  with  the  general 
"qualitative"  properties  of  elementary  perceptrons.  In  the  present  chapter, 
the  groundwork  for  a  quantitative  analysis  of  their  performance  will  be 
presented.  In  the  theorems  of  Chapter  5,  it  was  shown  that  the  existence 
and  attainability  of  solutions,  in  an  elementary  perceptron,  depends  strongly 
on  the  properties  of  the  0  -matrix.  Each  element  of  this  matrix,  , 

is  a  measure  of  the  generalization  of  reinforcement  from  stimulus  Sj  to  S- 
This  generalization  coefficient,  g -j  ,  varies  with  the  measure  of  the  set  of 
A-units  which  respond  jointly  to  S;  and  5y  .  Until  now,  the  actual 
quantitative  measures  of  these  sets  have  not  been  taken  into  consideration, 
and  only  the  formal  properties  of  the  matrix  G  have  been  considered.  The 
Q  -functions,  which  are  introduced  in  this  chapter,  represent  the  probabili¬ 
ties  that  an  A-unit  in  a  specified  class  of  perceptrons  will  respond  to  a 
particular  stimulus,  or  will  respond  jointly  to  a  designated  set  of  stimuli. 
These  Q  -functions  not  only  determine  the  expected  values  of  the  generali¬ 
zation  coefficients,  ,  but  enter  into  the  analysis  of  variability  of 

perceptron  performance  as  well,  as  will  be  seen  in  the  following  chapter. 

6.  1  Definitions  and  Notation 


The  Q  -functions,  defined  below,  are  always  specific  to  a 
particular  class  of  perceptrons  in  which  the  origin  point  configurations  of 
the  A-units  have  been  selected  according  to  some  designated  set  of  rules 
from  a  specified  S-set  or  retina.  The  functions  Q  are  defined  only  for 
simple  A-units,  a;  ,  which  are  said  to  be  active  if  the  algebraic  sums 
of  their  input  signals,  o^-  ,  are  equal  to  or  greater  than  their  thresholds. 


-1Z8- 


Gi  .  For  such  A-units,  Q  represents  the  probability  of  drawing  an 
A -unit  at  random  from  the  specified  distribution  which  responds  to  each  of 
a  specified  set  of  stimuli.  The  notation  employed  is  as  follows: 

Q.  =  probability  that  an  A-unit  in  a  specified  class  of 
perceptrons  responds  to  stimulus  5;  • 

probability  that  an  A-unit  in  a  specified  class  of 
perceptrons  responds  to  stimulus  S[  and  also  to 
stimulus  Sj  • 

Q-j  _  ^  =  probability  that  an  A-unit  in  a  specified  class  of 

perceptrons  responds  to  each  of  the  stimuli  5’ ,  Sj ,  •  ■  •  ? 


6.Z  Models  to  be  Analyzed 

Three  types  of  models  will  be  considered  which  differ  in  the 
rules  by  which  connections  are  made  between  S -units  and  A-units.  It  turns 
out  that  for  the  three  cases,  the  distribution  of  input  signals  to  the  A-units 
is  expressed  in  terms  of  binomial,  Poisson,  and  normal  random  var'ables, 
respectively.  These  models  are  therefore  named  binomial , Poisson  ,  and 
Gaussian  models. 


6.Z.1  Binomial  Models 


In  a  binomial  model  the  input  signal,  cy  •  ,  received  by 

unit  '2  •  ,  is  distributed  as  the  difference  of  two  binomially  distributed 
random  variables.  This  model  characterizes  a  type  of  perceptron  in  which 
each  A-unit  receives  a  fixed  number  of  connections  from  the  "retina". 
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(a)  BINOMIAL  MODEL,  WITH  X  =  Z,  y-l 


S-UNITS  A-UNITS 


5  INPUT  CONNECTIONS 
TO  EACH  A-UNIT,  WITH 
RANDOM  ORIGINS 


(b)  POISSON  MODEL,  WITH  .CONSTRAINED  ORIGINS 


5  OUTPUTS  FROM 
EACH  S-UNIT, 
WITH  RANDOM 
TERMINATIONS 


(c)  POISSON  MODEL,  WITH  RANDOM  ORIGINS 


ORIGIN  AND  TERMINAL  POINTS 
CHOSEN  AT  RANDOM  FOR  EACH 
CONNECTION 

Figure  6  ILLUSTRATION  OF  TYPICAL  S  TO  A-UNIT  CONNECTIONS  (ARROWHEADS 

INDICATE  RANDOMLY  SELECTED  TERMINATIONS).  IN  GAUSSIAN  MODELS, 
THE  VALUES  OF  THE  CONNECTIONS  (SHOWN  HERE  AS  ±  /)  ARE  NORMAL 
RANDOM  VARIABLES. 
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consisting  of  exactly  X  "excitatory”  and  tj  "inhibitory"  connections.  Each 
of  the  excitatory  connections  has  the  value  +1,  and  each  inhibitory  connection 
has  the  value  -1.  The  threshold,  Q  ,  is  assumed  to  be  fixed  for  all  A-units. 
The  origins  of  the  connections  to  an  A-unit  are  selected  independently,  with 
uniform  probability,  from  the  entire  set  of  S-units  (or  retinal  points). 
Specifically,  a  set  of  equiprobable  origin  configurations  can  be  constructed 
as  follows;  Let  there  be  connections,  numbered  from  1  to  .  Let  the 
S-units  be  numbered  from  1  to  A/^  Then  the  set  of  all  possible  sequences 


of  integers,  each  having  a  value  in  the  range 


I  ^  n  x  N 


.d 


corresponds 


to  the  complete  set  of  A-units.  In  this  model,  the  number  of  distinguishable 
A-units  possible  for  a  retina  of  points  1°  ^  ' 


In  the  binomial  model,  Q  functions  do  not  depend  on  the  number 
of  sensory  units,  but  on  the  fraction  of  them  which  are  illuminated.  A  variation 
of  this  model  has  been  analyzed  in  Ref.  79.  where  the  additional  constraint  is 
introduced  that  no  two  connections  to  a  single  A-unit  can  originate  from  the 
same  S-unit.  It  has  been  shown  that  for  moderately  large  numbers  of  S-units, 
this  model  is  practically  indistinguishable  from  the  true  binomial  model 
described  above . 


6.2.Z  Poisson  Models 


In  a  Poisson  model,  ex-  is  distributed  as  the  difference  of 
two  Poisson -distributed  ranaom  variables.  In  this  model,  it  is  assumed 
that  the  number  of  input  connections  to  an  A-unit  is  not  fixed,  but  is  a 
random  variable.  The  model  corresponds  to  one  of  two  situations,  the 
equations  for  the  Q  -functions  being  identical  for  both: 

>1« 

The  derivation  of  this  formula  can  be  found  in  Feller,  Ref.  21,  page  52. 
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(1)  In  the  constrained  origin  model,  each  S-unit  emits  a  fixed  number  of 

output  connections ,  consisting  of  excitatory,  and  inhibitory  connections 

(with  values  +1  and  -1,  respectively).  Terminal  points  are  selected  at  random 
from  a  set  of  A/^  A-units.  For  the  model  to  hold  exactly,  and 

should  both  be  infinite,  the  ratio  /^a  being  a  parameter  of  the  system. 

For  finite  A/j  and  N .,  the  model  remains  a  close  approximation. 

(2)  In  the  random  origin  model,  a  set  of  A  excitatory  and  N  ^  inhibitory 
connections  are  each  independently  assigned  an  origin  and  a  terminus  at 
random,  from  a  set  of  S-units  and  A-units,  with  uniform  probabilities.  In 
this  case,  for  the  model  to  hold  exactly,  the  numbers  N  ^  ,  N ^  and 

f'J  \  ■  J 

should  all  be  infinite,  with  -  ^  being  a  parameter  of  the  system; 

as  in  the  previous  case,  however,  the  model  is  a  close  approximation  for 
finite  systems. 

In  the  Poisson  model,  for  Case  (1),  the  number  of  possible  A- 
units  is  '  ; /  /  "  ,  /  /  •  .  For  Case  (2),  the  number  of 

.  -V,  N, 

possible  A-units  is  -f  i)  "  ( /  ij  •  .  The  binomial  model,  the 

constrained-origin  Poisson  model,  and  the  random-origin  Poisson  model 
yield  increasingly  large  sets  of  possible  A-units,  for  the  same  numbers  of 
S-units,  A-units,  and  connections. 

6.2.3  Gaussian  Models 


In  the  Gaussian  case,  v  is  distributed  as  the  difference 
of  two  normally  distributed  random  variables,  i.e.,  is  normally 

distributed.  While  both  of  the  above  cases  converge  to  a  Gaussian  model 
as  the  number  of  input  connections  to  an  A-unit  becomes  large,  we  shall 
be  concerned  here  with  a  model  in  which  the  number  of  connections  renia'.is 
finite,  but  the  values  of  the  connections  are  normally  distributed. 
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6.3  Analysis  of 


For  both  the  binomial  and  Poisson  models,  Q-  ,  the  probability 
that  an  A -unit  is  activated  by  stimulus  5;  ,  is  given  by  the  probability  that 

the  total  input  signal  cY  is  equal  to  or  greater  than  the  threshold,  Q 
Specifically, 

(6.1) 


t-e 

Q.  =  21  p(r^)  -21  ZL  px(p)p^(p) 

(>L>PI  E  =  e  1  =  0 


where  P/  for  binomia]  model 

|o<3  for  Poisson  model 

P^(E)  =  probability  that  exactly  E  of  the  excitatory  connections 
to  an  A-unit  originate  from  active  S-points. 

‘  )  -  probability  that  exactly  I  of  the  inhibitory  connections 

to  an  A-unit  originate  from  active  S-points. 


For  the  binomial  model, 

Pyi^)  =  ['[]p;'(i-Pi)'^'^ 


(6.2) 


where  fE ^  ^  fraction  of  retinal  points  (S -units)  activated  by  stimulus  5; 
For  the  Poisson  model. 


Px(P)  - 
Py(r)  - 


{Rllf 

-F-  r 

F! 

(P:  9.^'  . 

-  y 

T! 

C 

(6.3) 
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where 


expected  number  of  excitatory  input  connections 
to  an  A -unit. 


7 


-- 


expected  number  of  inhibitory  input  connections 
to  an  A -unit. 


for  the  Poisson  model  can  be  expressed  alternatively  by 
the  following  identity  (pointed  out  by  Prof.  H.  D.  Block): 


/-‘(cy.)  -  A' l^(e  - 


=  rv; 
/ 


i  w/  '' 


Where  ^  Bessel  function  of  an  imaginary  argument,  given  by 


oc 

z: 

y-7 


The  use  of  this  equation  makes  it  possible  to  compute  -functions 
for  the  Poisson  model  by  hand,  with  the  aid  of  tables  of  Bessel  functions  (c.f.  , 
Ref.  37,  pp.  2Z4-Z33). 


For  the  Gaussain  model,  equation  (6.  1)  requires  an  additional 
factor  representing  the  distribution  of  value  for  each  of  the  connections. 
Specifically,  if  the  absolute  values  of  both  excitatory  and  inhibitory  connections 
are  distributed  with  mean  u  and  standard  deviation  (f  ,  we  have 

O' 

^  max  ^max  / 

<?,■  Z  L,  <l>iDc,i)dD 

f-0  r-n  / 
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where 


/ 

{Jn  cr^ 


V  <fD 


Mq 

^0  ^  {f^  ^  T.)  6'^ 


Pj.  (,^,Jand  P^(/j  ,  in  equation  (6 . 4)  are  given  either  by  (6 . 2)  or  (6.3), 
depending  on  whether  the  number  of  input  connections  to  an  A -unit  is  fixed 
(as  in  the  binomial  model)  or  random  (as  in  the  Poisson  model). 

Figures  7  and  8  show  representative  families  of  curves  for  Q-^ 
as  a  function  of  Q ,  for  the  binomial  and  Poisson  models,  respectively. 
Note  that  both  models  are  very  similar  in  their  basic  characteristics. 
Specifically: 

1.  In  all  cases,  for  /  ■  •  J  and  '  '  v  ,  increases  monotonically 

with  /' 

2.  For  purely  excitatory  models  y  (  )  goes  to  1 . 0  as  k’- 

approaches  1.0.  (Figures  7a  and  8a). 

3.  For  models  with  e>  Q[  goes  to  zero  as  approaches  1.0. 

(Figures  7b  and  8b). 

4.  For  r  ■  y  ,  (,>•  tends  to  remain  invariant  except  for  very  small  or 

very  large  values  of  *- •  .  The  range  over  which  tends  to 

remain  constant  is  increased  if  the  number  of  connections  becomes 
large  (Figs.  7c  and  8c).  In  the  limit,  with  small  0  and  large  Z 
and  (j  ,  Q-  approaches  .5  for  all  values  of  k' ■  except  0  and  1. 


-135- 


OF  VARYING  TO' 


(c)  EFFECT  OF  VARYING  TOTAL 


137 


Figure  8  Q-  AS  FUNCTION  OF  RETINAL  AREA  ILLUMINATED,  FOR  POISSON  MODEL 


5.  Keeping  x  fixed,  then  for  small  0  ,  Q-  is  generally  great er 

for  the  binomial  m.odel  than  for  the  Poisson  model.  For  large  e,  0; 
is  greater  for  the  Poisson  model. 

6.  For  the  binomial  model,  Q-  =  0  for  x  <  O  while  for  the  Poisson 
model,  Q-  -  0  only  if  x  -  0  . 

6.4  Analysis  of  Oij 


Q-j  is  the  probability  that  an  A -unit  is  activated  by  each  of 
two  stimuli,  5;  and  S'  .  For  both  the  binomial  and  Poisson  models,  Q-  • 
can  be  expressed  by  the  equation: 


where  6 


threshold  of  A-units 


(6.5) 


-  number  of  excitatory  connections  originating  from  points 
illuminated  by  E'  but  not  by  Sj 

E :  ~  number  of  excitatory  connections  originating  from  points 

illuminated  by  Sj  but  not  by  S; 

t.^  ~  number  of  excitatory  connections  originating  from  points 

common  to  S,'  and  S' 

T-  -  number  of  inhibitory  connections  originating  from  points 
illuminated  by  S’  but  not  by  S' 

Ij  -  number  of  inhibitory  connections  originating  from  points 
illuminated  by  Sj  but  not  by 

-  number  of  inhibitory  connections  originating  from  points 
common  to  3-  and  S'- 

'  ~  j 
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The  point  sets  involved  in  the  analysis  of  Q-  •  are  illustrated  in  Figure  9. 
For  the  binomial  model,  the  required  probabilities  are  given  by  the  multi¬ 
nomial  equations; 


P. 


(^L  ^  f.iE-lE.i 


)(  I 


^l!EjlEj:x-Er-Ej-E^)! 


y  j  ’  T-i  I  - 1  r  I 


V/ 


r:  I-  I. 


jA.  ''C  ^{l-A.-Aj-C) 


.  (6.6) 


where  =  proportion  of  retinal  points  illuminated  both  by  5;  and 


A-  =  E  ■  -  i'  where  A’-  is  the  proportion  of  retinal  points  illuminated 
by  5  •  ; 

Aj  =  Ar'y  -  C  where  A'j  is  the  proportion  of  retinal  points  illuminated 
by  5;  . 


For  the  Poisson  model  (where  and  y  are  the  expected  numbers  of 
excitatory  and  inhibitory  connections  to  an  A-unit), 


F) 


£■!  -  J 


/  -  r  'I' 


r/1 


A'/l, 


'TA-I 

J 


-  ~/C 

■r  (■< 


(6.7) 


' ) 


-  / 


^  ■‘\uA0 


)^-c 


As  in  the  case  of  ,  the  Gaussian  model  for  (.y  •  requires 
an  additional  factor  representing  the  normal  distribution  of  connection  values. 
The  components  of  the  input  signal,  ,  which  originate  from  the  unique 

S -units  in  ,  the  unique  points  in  Sj  .  and  from  the  common  retinal 

set  are  designated  f'-  ,  [)•  ,  and  /y  ,  respectively.  By  analogy  to 

(6.4), 
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■"Of  =  “  I  o'." 

O^Df)  =  f  I  f)  '  defined  as  in  (6.4). 


Then , 


H. 

.  r- ." .  ' 


c  . 


■''';  ■■  ;-r 


CO 


(6.8) 


e >■:,,)  0 !D;)  0(0; J  c/D.  cl D;  rlO; 


1/  ly 

.  ‘  /  L  J 


For  some  purposes,  the  distribution  of  the  input  signals,  cv  ■  ,  and  v-,  ,  is 

of  interest.  The  joint  probability,  P  ( ■  .  .v,  *  ,  is  given  by 


i:., 

^  f  • ,  /-  .  /  ■  ,  7  •  ,  > 


r»o 

/’ 


•  J 


,/■  !  I  /(o-- ■  -  A  A//.> 

’J.  '/'•  i  ^  ■  I  ^  I 


,  (6.9) 


-  -  .•>' 


Tt  should  be  noted  that  ^  is  a  special  case  of  these  equations,  for 

which  -^1  •  -  /<  ■  .  Tables  of  •  for  binomial  and  Poisson  models 

have  been  published  in  Ref.  87. 


Figures  10  and  11  illustrate  the  quantitative  properties  of  (7;  •  , 

as  a  function  of  ;  ,  the  measure  of  the  intersection  of  stimuli  '  •  and  i,j 

on  the  "retina".  For  convenience  of  representation,  is  actually  plotted 
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(c)  VARIATION  WITH  STIMULUS  SIZE  (fc) 


Qlj  AS  A  FUNCTION  OF  THE  RELATIVE  INTERSECTION 
FOR  BINOMIAL  MODEL.  R;=  R;  IN  ALL  CASES. 


QiJ  AS  FUNCTION  OF  THE  RELATIVE  INTERSECTION, 
FOR  POISSON  MODEL.  /<  ■  =  R-  IN  ALL  CASES. 


as  a  function  of  the  relative  intersection  (or  proportional  intersection),  c  /  k  , 
and  being  equal  for  all  cases  shown.  Note  that  for  =  /  > 

O-  •  =  /"■■■  =  Q-  .  The  main  features  of  these  curves  are: 

T  L  J  -I  i  i.  "  I 

1.  In  all  cases,  increases  monotonically  with  C 

2.  For  large  (9  ,  Q-j  tends  to  remain  close  to  zero,  except  for 
stimuli  which  approach  perfect  identity  (  r//P  close  to  1.0). 

3.  For  large  values  of  V  ,  tends  to  accelerate  more  rapidly 

as  C  approaches  1  . 

4.  For  the  binomial  model,  Q-j  for  disjoint  or  w'ell  separated  stimuli 

(  C  O  )  may  have  a  maximum  with  respect  to  ^  .  This  effect 

is  not  found  in  the  Poisson  model .  (Figs .  10c  and  He.) 

5.  For  equivalent  parameters,  Q-j  tends  to  show  a  sharper  "shoulder” 
in  the  binomial  model  than  the  Poisson  model. 

The  second  of  these  properties  is  an  important  factor  in 
determining  the  discriminative  capabilitv  of  a  perceptron.  It  is  shown  best 
in  terms  of  the  conditional  probability.  Q-^j  ,  that  an  A-unit  which  responds 
to  also  responds  to  S-  ■  Qi  ;  is  equal  to  Q- ; ,  Q-  ,  and  is  shown  for 

several  typical  cases  in  Fig.  12.  Note  that  for  large  values  of  Q  ,  the 

probability  that  an  A-unit  responding  to  9  ;  responds  to  a  second  stimulu.'s , 

,  is  virtually  zero,  unless  the  stimuli  approach  perfect  identity.  The 
difference  between  the  binomial  and  Poisson  models  is  shown  most  clearly 
in  Figures  12(a)  and  1.2(b).  Figure  12(c)  demonstrates  that  the  conditional 
probability  depends  only  slightly  on  stimulus  size.  Additional  curves  for 
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CONDITIONAL  PROBABILITY  {  Oi\j  )  THAT  AN  A-UNIT  RESPONDING 
TO  S;  ALSO  RESPONDS  TO  S:  ,  SHOWN  AS  FUNCTION  OF 


these  functions  can  be  found  in  Refs.  79  and  80. 


In  analyzing  the  gamma  system,  it  will  be  seen  that  the 
conditions  under  which  Q--  ‘V,  are  of  particular  interest,  since  for 

the  gamma  system  the  expected  value  of  Q;  ^  is  zero  for  such  conditions. 

In  the  binomial  model,  '''  ■  ■  V  if  -  /P-  .  This  condition 

will  tend  to  be  met  if  the  stimuli  are  randomly  chosen  sets  of  S  -points, 
the  expected  intersection  of  any  two  such  sets  being  equal  to  the  product  of 
the  measures  of  the  sets.  It  can  readily  be  seen  that  under  these  conditions, 
the  probability  that  an  origin  point  which  is  in  S-  is  also  in  S-  is  the  same 
as  the  probability  that  an  origin  point  which  is  not  in  5y  happens  to  be  in  S;  ! 
in  other  words,  the  probability  that  the  origin  of  a  connection  is  in  does  not 

depend  on  whether  or  not  it  is  in  S ;  >  and  consequently  the  response  to  Sj 

is  independent  of  the  response  to  i  •  ,  yielding  Q-j  =  the  Poisson 

model,  however,  (,>■  ■  7 '  Vy  only  if  C  -  (J  (i.e.,  for  disjoint  stimuli)  since 

the  connections  received  from  any  disjoint  subset  of  S-units  are  independent 
of  connections  (or  signals)  from  any  other  subset. 

6 . 5  Analysis  of  .  4 


In  the  following  chapter,  it  will  be  seen  that  the  expected  responses 
of  a  simple  perceptron  can  generally  be  determined  from  the  functions  Q, 
and  O  -  .  The  variability  of  performance  in  a  class  of  perceptrons,  how¬ 
ever,  will  be  seen  to  depend  on  the  joint  probability,  <  that  an  A-unit 

responds  to  each  of  three  stimuli,  ■  ,  ,  and  _  ^  .  The  equations  are  a 

straightforward  generalization  of  those  employed  in  the  last  section  for  . 

Specifically,  there  are  now  seven  excitatory  and  seven  inhibitory  signal 
components  to  be  considered: 
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t- ■  -  Excitatory  signal  from  S -units  responding  to  \- 

but  not  to  Sj  or  5^ 

£•  =  excitatory  signal  from  S -units  responding  to  S  ’ 

but  not  to  S/  or  5 ^ 

£/■  -  excitatory  signal  from  S -units  responding  to 

but  not  to  •  or  5 ; 

■ '  J 

F ■  j  -  excitatory  signal  from  S-units  responding  to  S' 
and  Sj  but  not  5^ 

-  excitatory  signal  from  S -points  responding  to  S; 

and  but  not 

F ■ ,  =  excitatory  signal  from  S-points  responding  to  S' 

J  ^  J 

and  but  not  S; 

-  excitatory  signal  from  S-points  responding  to  all 
three  stimuli , 


Inhibitory  components  are  defined  analogously.  This  yields  the  equation 


where 


in/;  >  ‘- 
J  /  ■ 


(6, 


'V- 

=  L 

h  ■ 

+ 

F-. 

4- 

c  .  ,  , 

-  /■  -/■••■ 

.  r 

-  /•  ■ 

/ 

'  / 

1  ;  <* 
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F-  -r  - 

■  ^  / 

_  7' 

J 

J 

!  1 

(  1  A* 

J  'J 

J 

'  O' 

-  t 

K. 

J  r 
^  ,£ 

y 

4 

■ 

-  r  ~  I , 

6  ‘  !  /■ 

-  1 .  . 

' ',/  ^ 

10) 


-147- 


The  multinomial  and  Poisson  probabilities  employed  in  (6.  10)  for  the 
binomial  and  Poisson  models,  respectively,  are  obtained  by  extension 
of  (6.6)  and  (6.7),  with  appropriate  measures  for  the  various  double  and 
triple  intersections  among  the  stimuli. 

6.6  Bias  Ratios  of  A-units 


Bias  ratios  were  defined  in  Section  5.4  as  the  ratio  of  the 
number  of  stimuli  in  the  positive  class  to  the  number  of  stimuli  in  the 
negative  class,  which  activate  an  A-unit.  In  Theorem  2,  it  was  shown 
that  there  must  be  some  variation  in  the  bias  ratios  of  the  A-units  in  a 
perceptron,  if  a  solution  to  a  given  classification  is  to  exist,  and  Theorems  9 
and  10  showed  that  the  closely  related  "bias  numbers"  yield  necessary  and 
sufficient  conditions  for  solutions.  Clearly,  the  distribution  of  bias  ratios 
depends  on  the  probabilities  i.';  ■  ■  ■  ,  that  the  A-units  will  respond  to 

various  possible  sets  of  stimuli,  '•  ,  .  Rather  than  undertake 

a  detailed  analysis  of  bias  ratios,  empirical  data  are  presented  for  a  typical 
case,  to  illustrate  how  we  might  expect  the  "responsiveness"  of  A-units  to 
different  classes  of  stimuli  to  be  distributed.  These  data  were  obtained  by 
a  Monte  Carlo  procedure,  in  which  10,000  A-units  were  tested  on  a  digital 
computer  to  determine  to  how  many  stimuli  of  each  cl6.ss  they  responded.* 


The  program  was  written  by  A.  Geoffrion,  for  the  Burroughs  220 
computer  at  Cornell  University. 
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The  "retina"  consists  of  a  20  by  20  mosaic  of  S-units  ,  and  the  stimuli  con¬ 
sist  of  4  by  20  bars,  placed  vertically  or  horizontally  on  the  retina,  in  all 
possible  positions.  The  retina  is  assumed  to  be  toroidally  connected,  so 
that  bars  placed  near  one  edge  of  the  field  may  re-enter  at  the  opposite 
edge.  Thus,  there  are  twenty  possible  horizontal  bars  (the  positive  class) 
and  twenty  possible  vertical  bars  (the  negative  class).  This  universe  will 
be  used  as  a  standai'd  one  in  a  number  of  learning  experiments  .  to  be 
analyzed  in  the  following  chapters.  Table  1  shows  the  number  of  A-units 
out  of  10.000  responding  to  each  possible  combination  of  horizontal  bars 
and  N  vertical  bars.  An  A-unit  which  responds  to  4  horizontal  and  6  vertical 
bars,  for  example,  is  tallied  in  the  5th  row  and  7th  column  of  the  table.  Each 
A-unit  had  five  excitatory  and  five  inhibitory  connections,  and  a  threshold  of  2. 

For  stimuli  which  are  more  similar  to  one  another  (in  terms  of 
possible  intersection  of  S-sets)  than  horizontal  and  vertical  bars,  we  would 
expect  to  find  the  A-units  less  well  distributed,  and  a  greater  concentration 
around  the  diagonal.  One  would  also  expect  that  in  a  universe  in  which  the 
stimulus  classes  are  less  symmetric  in  their  properties,  the  distribution 
of  A-units  would  be  less  symmetric  than  that  shown  in  Table  1.  Table  2 
illustrates  both  of  these  features.  In  this  case,  the  "positive"  class 
consists  of  4  by  20  horizontal  bars,  just  as  before;  the  "negative"  class, 
however,  consists  of  a  set  of  6  by  20  horizontal  bars.  Again,  there  are 
twenty  members  of  each  class,  but  the  maximum  intersection  possible  between 
stimuli  of  the  positive  and  negative  class  is  much  greater  than  before,  and  the 
size  difference  introduces  an  asymmetry  which  was  not  previously  present. 

The  toroidal  retina  has  the  convenient  property  of  being  unbounded  and 
isotropic,  with  a  finite  surface.  Any  relations  which  hold  for  a  set  of 
stimuli  projected  onto  the  retina  hold  equally  well  if  all  stimuli  are 
displayed  by  any  combination  of  horizontal  and  vertical  translations. 

This  model  (with  Born-von  Karmdn  boundary  conditions)  is  easier  to 
analyze  than  a  spherical  retina  which  has  similar  properties. 
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TABLE  I 


JOINT  DISTRIBUTION  OF  10,000  A-UNITS,  WITH  RESPECT  lO  NUMBERS  OF 
HORIZONTAL  BARS  AND  NUMBERS  OF  VERTICAL  BARS  TO  W»^.CH  THEY  RESPOND 


(HORIZONTAL  BARS) 


N~ 

(VERTICAL  'ARS) 


0 

1 

2 

3 

4 

5 

6 

0 

287 

326 

349 

.*•  ,  4' 

J  t  . 

324 

63 

27 

1 

315 

392 

378 

:76 

306 

71 

30 

2 

325 

417 

441 

?09 

?5I 

92 

27 

3 

324 

382 

39^ 

353 

343 

90 

37 

4 

330 

351 

T64 

340 

305 

68 

24 

5 

68 

87 

"9 

34 

85 

27 

8 

6 

32 

36 

"4 

27 

26 

7 

2 

7 

6 

9 

7 

6 

2 

0 

TABLE  2 


JOIN’  DISTRIBUTION  OF  10,000  A-UNITS 
4  X  2,0  AND  6  x  20  HORIZONTAL  BARS 


WITH,. 

T,.* 


./<ESPECT  TO  NUMBERS  OF 
b  WHICH  THEY  RESPOND 


0 

1 

2 
3 

(4  X  20  BARS) 

5 

6 

7 

8 


Willie  Ihp  oint  distributions  il  istrated  here  are  not  of  great 
utility  in  analyzirg  perceptron  perform  ice,  they  provide  considerable 
insight  iirto  \vii:„t;  takes  place  vviLhin  <  association  system  when  a  perceptron 
learn;  a  r  v,:,s  sification  of  at  muli  Units  situated  on  the  diagonal  (i.e.,  units 
which  resp!  nd  equally  t.  both  .  asses  of  stimuli)  are  essentially  "duds";  they 
■'’on  ,ribute  little  to  a  iiserh-  mation,  and  are  as  likely  to  be  reinforced 
oositively  as  nee,i'‘'eely ,  A-units  which  have  a  strong  bias  towards  one  class 
or  the  other,  h  wevs’  ,  (those  situated  in  the  upper  right  or  lower  left  corners 
of  the  tab’e-')  arc  -.i  .,eful  "discriminators".  In  learning  a  classification,  the 
perceptro'..  reli'"  ,<  on  combinations  of  such  units,  transmitting  large-valued 
.signals  to  ejLablish  a  bias  towards  the  proper  class  when  a  stimulus  appears. 
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7. 


PERFORMANCE  OF  ELEMENTARY  -PERCEPTRONS  IN 


PSYCHOLOGICAL  EXPERIMENTS 


So  far,  only  the  formal  properties  of  elementary  perceptrons 
have  been  analyzed,  without  regard  to  particular  experimental  situations 
or  procedures.  We  are  now  ready  to  begin  a  quantitative  analysis  of  the 
performance  of  these  systems  in  "psychological"  experiments,  i.e., 
experiments  in  which  the  procedures  and  observations  are  analogous  to 
those  which  might  be  performed  on  a  biological  organism,  A  number  of 
such  experiments  were  defined  in  Part  1.  Section  3,3.  In  this  chapter,  we 
shall  be  chiefly  concerned  with  discrimination  experiments  (c  .f .  ,  Section  3.3.1), 
since  the  capabilities  of  elem.entary  perceptrons  are  largely  limited  to  this 
category.  Before  going  on  to  other  types  of  systems,  however,  we  will 
consider  what  kinds  of  behavior  might  be  expected  of  an  elementary 
system  in  generalization  experiments,  figure  detection  experiments,  and 
other  problems  which  were  discussed  in  Chapter  3.  The  analysis  of 
discrimination  experiments  which  is  reported  here  is  basically  similar  to 
that  which  was  originally  presented  in  Ref.  79.  The  former  models  have 
been  substantially  simplified,  however,  and  the  analysis  has  been  made 
more  rigorous,  thanks  largely  to  the  work  of  R.  D.  Joseph,  (Ref.  41). 

7.1  Discrimination  Experiments  with  S -controlled  Reinforcement 


The  first  problem  to  be  nnalvzed  is  that  of  a  discrimination 
experiment  in  which  the  perceptron  is  presented  with  a  sequence  of  stimuli 
from  an  environment,  hV  ,  and  is  reinforced  for  each  stimulus  in  the 
sequence  in  accordance  with  a  predetermined  classification,  C(  ,  with 
the  reinforcement  control  constant,  ,  taking  the  sign  of  the  required 
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response.  The  perceptron  is  then  shown  a  test  stimulus  and  the 

response  to  this  stimulus  is  determined.  The  measure  of  performance  for 
a  class  of  perceptrons  (characterized  by  the  parameters  ,  9  ,  X  ,  and 
y  for  a  binomial  model  or  by  9  ,  X  ,  and  y  for  a  Poisson  model) 

is  the  probability  that  a.  perceptron  from  the  specified  class  will  give  the 
correct  response  to  5;^-  after  having  been  "trained"  with  the  specified 
sequence  of  stimuli. 

7.1.1  Notation  and  Symbols 


^  •  -  the  ;■  stimulus  in  the  environment 


+  1  if  -y  is  in  the  positive  class 
-1  if  Ty  is  in  the  negative  class 


..  y 


J  1  if  the  i  A -unit  is  active  for  ,  ■  ■  ■  >  and  5^ 

I  0  otherwise 

Fa;9;i..x)  probability  that  z)  -  / 

(as  defined  in  Chapter  6) 


T 


ST) 


duration  (number  of  stimuli)  of  the  training  sequence 

1  r  ,  •  r  ,  •  A  ■  e  i 

value  of  the  connection  from  the  (  A-unit  after  the 
training  sequence 


A  ^ 

Xif^z)  Ci^(x,T)  al  ( x)  ^r-^AT)  signal  received  by  the 

R-unit  on  connection 
when  test  stimulus  S-^  is 
shown  after  the  training 
sequence.  The  time  T  will 
be  understood  unless  other¬ 
wise  specified. 
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Zv  =  total  input  to  the  response  unit  when  5^  is  shown 
{ 

after  the  training  sequence.  For  present  purposes, 
the  symbol  ^  will  be  used,  as  in  Chapter  5.  Time 
r  is  understood  unless  otherwise  specified. 

In  terms  of  these  symbols,  the  reinforcement  rule  for  a  quantized 
(."y  -system,  with  S  -controlled  reinforcement,  can  be  represented  by  the 
following  expression  for  the  change  in  when  stimulus  5y  is  shown; 

(J)) 


7.1.2  Fixed  Sequence  Experiments:  Analysis 

The  first  case  to  be  considered  is  that  of  a  fixed  training  sequence, 
in  which  a  definite  sequence  of  stimuli  (  i-,  ,  ,  •  •  •  >  .Sj-  )  is  shown  to  the 

perceptron.  In  a  later  section,  random  training  sequences  will  be  considered. 
The  fixed  sequence  consists  of  a  fixed  (though  not  necessarily  equal)  number 
of  showings  of  each  stimulus.  For  'V -perceptrons ,  the  order  of  occurrence 
of  these  stimuli  does  not  affect  the  results.  All  values  7/^7.  are  assumed  to 
be  zero  initially.  The  following  analysis  and  theorem  follow  the  treatment 
of  Joseph  (Ref.  41). 

If  a  given  perceptron  is  shown  a  training  sequence,  it  will  place 
a  test  stimulus  in  the  positive  class  if  l'  ^  is  greater  than  zero,  and  in 
the  negative  class  if  u  ^  if  less  than  zero.  For  the  given  perceptron, 
training  sequence,  and  test  stimulus,  u  is  a  determinate  number. 
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is  a  random  variable. 


Over  the  class  of  perceptrons,  however,  tJ.^ 

In  order  to  determine  the  probability  that  a  perceptron  from  the  specified 
class  will  classify  correctly,  we  must  know  the  probability  that 

has  the  correct  sign.  In  order  to  obtain  a  conservative  bound  on  the 
probability  of  correct  response  to  5^  ,  without  making  any  assumptions 
about  the  distribution  of  ci ^  >  Joseph  makes  use  of  the  Tchebysheff 
inequality,  which  states  that  for  any  random  variable  with  mean 
and  variance  -t'  ^  , 

Prob  ^  >  0  1  Ir  /  -  ,  !  if  //.  >  C 

>  / 

Prob  “I  ?  •'  9  {•  e  if  n.  <  0 

Consequently ,  if  the  ratio  j  can  be  made  arbitrarily  large, 

the  probability  that  v^.  for  a  randomly  selected  perceptron  will  agree  in 
sign  with  its  expected  value  over  the  class  of  perceptrons  can  be  made 

3!< 

arbitrarily  close  to  1  .  It  thus  becomes  important,  first  of  all,  to  know 
whether  or  not  the  expected  value  of  has  the  proper  sign. 


^  -  n.  ^  /  -r  !  ^  '^2 

is  used  in  place  of  the  two-sided  inequality  l^\j- u.\>  ij 
slightly  sharper  bounds  may  be  achieved,  i.e.  , 


O'  ■ 


P.. 


1  >  (I I  - 


O  f  I  ~ 


if  O 

if  <  O 


If  /t "  '•r  ^ 

In  the  range  of  interest,  this  additional  sharpness  is  insignificant.. 
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DEFINITION: 


5  will  be  called  a  positive  stimulus  (with  respect  to  a 
class  of  perceptrons,  an  environment,  classification,  and  training  sequence) 
if  the  expected  value  of  u agrees  in  sign  with  the  a.ssigned  class  of 

In  terms  of  the  symbols  introduced  above,  5^  is  a  positive  stimulus  if 

/  6' 

The  expected  value  of  v  for  an  oer  -perceptron  (assuming 
that  all  A-R  unit  connections  start  out  with  zero  value)  is  obtained  as 
follows.  Let  A-  =  the  number  of  tim.es  stimulus  $•  occurs  in  the 
training  sequence,  divided  by  T  ,  the  total  number  of  stimuli  in  the 
sequence  (i.e.,  the  proportion  of  the  training  sequence  which  is  5j  ). 

Then  the  value  of  the  connection  from  unit  •  at  the  end  of  the  training 
sequence  will  be  (since  the  magnitude  of  is  taken  to  be  1) 


where  the  sum  is  over  all  stimuli  in  iV'  ,  Consequently,  summing  over  all 
A-units,  the  input  signal  to  the  response  unit  when  the  test  stimulus  5^ 
occurs  will  be 


'LL  V  ’  L  sv 

j 

The  expected  value  of  /  ^  is  therefoz’e  given  by 

I  J 

'LL 

t 

-  'A  ,• 

'I  L—i  ■  J  J  '  J  > 

i 


(7.Z) 


(7.3) 
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From  the  above  definition,  it  follows  that  is  a  positive  stimulus  (and 
will  tend  to  be  correctly  classified)  if 


y 


P-  Q' 

j  j 


From  Equation  (7.3)  it  is  clear  that  ZT//^.  increases  linearly 
with  A'  .  Let  us  now  consider  the  variance  of  a  ^  .  This  is  obtained 

from  the  equation: 


o  ~ y) 


coi/.  \r  ■^(/), 


(7.4) 


For  the  conditions  currently  being  considered  (an  ce.  -system  with  a 

* 

predetermined  training  sequence!  the  only  source  of  variability  in  (x) 
is  in  the  selection  of  the  origin  point  configuration  of  the  unit  o ;  .  But  if 
we  assume  (as  in  all  models  thus  far  considered)  that  the  A-units  are  all 
chosen  independently  from  a  distribution  of  admissible  origin  configurations, 
the  covariances  will  all  be  zero,  and  o' "  (  r  ■  x))  does  not  depend  on  l 

Therefore,  the  general  equation  (7.4)  reduces  to 


- 1 


)  ^  N  .(r''(r 


.)  -  N.  \h  r  -  f 

X  It-  /  / 


•-  i  r 


(xi\ 


(7.5) 


(See  Rosenblatt,  Ref.  7  9  ,  pp.  8Z-83,  for  a  more  detailed  algebraic 
discussion  of  this  equality).  Now,  for  an  v;  -system, 


I  r 


1  I  :  I'  I 


and 


.  r 


..  Zy)  _  T 


This  yields,  for  the  required  expected  values  in  (7.5), 
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and 


E  e'-f.  i  -r.)  - 


'A 


E'e]^(x)  =  T-m^/^j/>^PjP^  Qj:, 

; 

Substituting  in  (7.5)  and  .simplifying,  this  yields 


’  ( 


A/,  r 


p-  Pj, 


-Q:xQit) 


(7.6) 


Note  that  the  variance  depends  on  C’,7  ■  while  the  expected  value  depends 

only  on  .  This  variance,  like  the  expected  value,  is  of  the  order  of 

.  We  are  now  in  a  position  to  prove  the  following  theorem  (due  to 
Joseph): 

THEOREM:  Given  a  class  of  elementary  ox  -perceptrons ,  a  finite 

stimulus  world  IV  ,  a  classification  C(W)  ,  and  a 
training  sequence;  then  for  every  £  >  0  ,  there  exists 
an  !  such  that  if  ,  the  probability 

of  selecting  a  perceptron  which  will  correctly  identify 
the  class  of  every  positive  stimulus  will  be  greater 
than  !  -  X 

PROOF:  From  the  Tchebyscheff  inequality,  we  have  seen  that  if 

/'  '  ,  )  ' can  be  made  arbitrarily  large,  the  probability 

that  '  ,  will  agree  in  sign  with  its  expected  value  over 
the  class  of  perceptrons  will  approach  >inity. 


It  has  also  been  demonstrated  (Equations  7.3  and  7.6)  that  both  u(it.^j 
and  o'  {  are  of  the  order  of  ;  therefore,  /c  (u-^j 

will  be  of  the  order  of  N ^  .  Thus,  for  each  positive  stimulus,  Sy  , 
the  probability  that  //  agrees  in  sign  with  y  can  be  made  arbitrarily 

dost  to  1  by  choosing  sufficiently  large.  Suppose  there  are  n  stimuli 

in  .  Then,  for  the  '  positive  stimulus  there  exists  a  quantity  N j  (i) 

such  that  if  -  A/;  if)  ,  the  probability  of  selecting  a  perceptron 

which  fails  to  correctly  identify  'j  ■  will  be  less  than  '-'/V/  .  If  we  let 

;  -  >nn/.  fj.fy}  ^  the  condition  required  by  the  theorem  is  satis¬ 

fied.  Q.E.D. 

From  Equations  (7.3)  and  (7.6),  it  is  seen  that  for  a  given  set 

of  stimulus  frequencies  -  •  ,  the  ratio  -F  '  does  not  depend  on 

Thus  any  number  of  repetitions  of  the  same  training  sequence  can  occur 

without  affecting  the  performance  of  the  system.  Since  “  o  "  varies 

/  2 

linearly  with  ,  the  normalized  ratio  y-c  “  forms  a  convenient 

measure  for  the  comparison  of  different  perceptron  models.  Some  numerical 
values  for  typical  cases  will  be  considered  in  the  following  section. 

While  the  above  analysis  permits  us  to  obtain  a  rigorous  lower 
bound  for  the  probability  of  correct  identification  of  A  ^  by  a  randomly 
selected  perceptron,  it  does  not  actually  yield  an  estimate  of  this  probability. 
In  order  to  estimate  the  probability  of  correct  identification  of  5y  ,  it  will 
be  assumed^that  n  is  normally  distributed.  The  justification  for  this 
assumption  was  discussed  in  Rosenblatt,  Ref.  79,  and  subsequent  analysis 
has  shown  that  the  approximation  is  very  close,  even  for  perceptrons  with  a 
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small  number  of  A-units.  Assuming  a  normal  distribution,  we  have  for 
the  probability  of  a  positive  response  to  ' 


P 


}  \ 


-  \  C' '  .  t  X  •  ■ 


where 


•r 


-  OO 


(7.7) 


Note  that  the  above  equations  do  not  depend  on  whether  the 
perceptron  is  constructed  according  to  the  binomial  model,  Poisson  model, 
or  any  other  other  model,  so  long  as  the  A-units  are  selected  independently 
of  one  another.  The  performance  does  depend  on  the  Q  -functions,  however, 
which  will  be  different  for  different  models.  From  equation  7,3  it  is  clear 
that  any  stimulus  V.,  will  tend  to  be  classified  correctly  if  the  average  value 
of  C;,  for  5-  in  the  same  class  as  is  greater  than  the  average  value 

of  ^  ,  for  ■  in  the  opposite  class  from  s'  .,  .  (If  the  frequencies  Pj 

are  not  all  equal,  each  must  be  multiplied  by  its  appropriate  frequency 

in  obtaining  these  averages.)  From  the  analysis  of  -functions  in  the 
preceding  chapter,  it  is  clear  that  this  condition  will  generally  be  met  if 
the  stimuli  of  each  class  have  large  intersections  with  one  another  (on 
the  retina)  while  stimuli  from  opposite  f.lasses  have  small  intersections 
with  one  another.  The  ideal  situation  would  consist  of  two  disjoint  clusters 
of  stimuli,  located  in  different  parts  of  the  retinal  field,  each  cluster 
representing  one  class.  In  order  to  discriminate  two  stimuli  reliably 
(i.e,,  to  assign  them  to  opposite  classes)  it  is  desirable  that  Q;  ■  for 
the  two  stimuli  should  be  small,  and  particularly  that  the  conditional 
probabilities  (,\jj  and  should  be  as  small  as  possible.  Figure  10, 
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in  the  last  chapter,  shows  that  this  condition  can  readily  be  met  if  the 
stimuli  have  a  small  intersection  with  one  another,  but  becomes  increasingly 
difficult  to  meet  as  the  intersection  increases.  This  figure  also  shows  that 
a  binomial  model  is  better  suited  to  the  discrimination  of  similar  stimuli 
than  a  Poisson  model,  where  1  ;  is  apt  to  be  relatively  large  even 
for  disjoint  stimuli. 

7.1.3  Fixed  Sequence  Experiments:  Examples 

The  environment  which  was  considered  in  the  last  section  of 
Chapter  6,  involving  twenty  horizontal  bars  and  twenty  vertical  bars  on  a 
20  by  20  toroidally  connected  retina  is  a  convenient  one  to  use  for  a 
"calibration  e.xperiment" ,  by  which  different  classes  of  perceptrons  can 
be  compared.  In  particular,  consider  the  following  discrimination 
expe  riment: 

EXPERIMENT  1:  Given  a  perceptron  with  400  sensory  points  arranged  in 
a  20  by  20  toroidally  connected  array,  or  "retina",  let  iV  consist  of  the 
twenty  possible  4  by  20  horizontal  bars,  and  the  twenty  possible  4  by  20 
horizontal  bars.  Let  ,i'-\  be  a  classification  which  assigns  every 
horizontal  bar  to  the  positive  class,  and  every  vertical  bar  to  the  negative 
class.  Show  every  bar  in  .-V  to  tlie  perceptron  exactly  once  (or  in  a 
sequence  with  ■'  ■  equal  for  all  stimuli).  During  this  training  sequence, 
the  perceptron  is  reinforced  with  S  -controlled  reinforcement.  Then 
select  one  of  the  bars,  ,  and  determine  whether  the  response  is 

correct,  according  to  ,  >  . 
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Figure  in  PROBABILITY  OF  CORRECT  I NDENTI FI  CATION  OF  A  TEST  STIMULUS  BY  AN 

ELEMENTARY  -PERCEPTRON,  IN  EXPERIMENT  2  (FOR  TWO  BINOMIAL  MODELS). 
CURVES  ALSO  APPLY  TO  '-PERCEPTRONS  (SEE  CHAPT.  8) 


NUMBER  OF  ASSOCIATION  UNITS  (/V^) 


Figure  13  PROBABILITY  OF  CORRECT  INDENTI FICATION  OF  A  TEST  STIMULUS  BY  AN 
ELEMENTARY  a-PERCEPTRON,  IN  EXPERIMENT  I  (CURVES  ALSO  APPLY  TO 
r'-PERCEPTRONS;  SEE  CHAPT.  8) 


10  100 
NUMBER  OF  ASSOCIATION  UNITS  (A/^) 


1000 


1000 


2  /  2 

Table  3  shows  the  performance  ratios,  //.  / rT  ,  for  a  100 
A-unit  binomial  model  'v^  -perceptroii,  with  various  combinations  of  the 
parameters  x  and  /  (  in  all  cases  )  .  The  parameters  /  ^  , 

,  ^  i  ,  ^  2  ,  appear  to  be  optimum  for  this  experiment,  as  can  be 

seen  from  the  table.  (Increasing  the  threshold  results  in  a  definite  drop 
in  performance.  )  Figure  13  shows  the  performance  of  several  binomial 
and  Poisson  model  perceptrons  as  a  function  of  N ^  ,  computed  from 

Equation  (7.7).  The  top  curve  shows  the  performance  of  the  optimum 
(binomial)  system.  A  comparison  of  the  other  two  curves  illustrates  the 
relatively  poor  performance  of  the  Poisson  model  on  this  particular  problem. 

It  should  be  emphasized  that  the  parameters  found  to  be  optimum 
in  this  experiment  will  not  necessarily  turn  out  to  be  optimum  in  other 
environments,  or  other  classifications.  In  general,  it  appears  that  as  the 
classes  of  patterns  to  be  discriminated  become  more  "similar”,  (i.e.  ,  as 
the  maximum  possible  overlap  between  stimuli  from  opposite  classes 
increases)  the  optirnum  number  of  connections  to  an  A-unit  and  the  optimum 
value  of  -  tend  to  increase. 

A  more  difficult  classification  of  the  same  dichotomy  has  been 
studied  in  the  following  experiment; 

EXPERIMENT  2:  With  the  same  environment  as  in  E.xperiment  1,  number 
the  horizontal  and  vertical  bars  consecutively  according  to  their  position  on 
the  retina.  Let  the  classification  C >  ■  place  all  even  numbered  bars  in 

the  positive  class,  and  all  odd  numbered  bars  in  the  negative  class.  The 
training  and  testing  procedures  are  identical  to  Experiment  1  . 
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TABLE  3 


PERFORMANCE  RATIOS  FOR  100-A-UNIT  ELEMENTARY  o^r-PERCEPTRONS 

\cr»(ux)J 

(BINOMIAL  MODEL)  FOR  EXPERIMENT  I  (HORIZONTAL/VERTICAL  BAR  DISCRIMINATION, 

FIXED  SEQUENCE).  @  =  2  IN  ALL  CASES. 


X  (NUMBER  OF  EXICITATORY  CONNECTIONS  PER  A-UNIT) 


2 

3 

4 

5 

0 

2.474 

2.831 

1 .540 

.931 

i  / 

1 

2.063 

2.912 

2.104 

1.349 

y 

(NUMBER  OF 

2 

1.708 

2.805 

2.479 

1.773 

3 

1 .406 

2.592 

2.670 

2.140 

INHIBITORY 

4 

1.153 

2.329 

2.708 

2.414 

CONNECTIONS 

5 

.941 

2.006 

2.630 

2.579 

PER  A-UNIT) 

6 

.767 

1.777 

2.473 

2.638 

7 

.623 

1  .523 

2.271 

2.605 

TABLE  4 

PERFORMANCE  RATIOS  FOR  lOO-A-UNIT  ELEMENTARY  o^-PERCEPTRONS 
(BINOMIAL  MODEL)  FOR  EXPERIMENT  2.  0  =  2  IN  ALL  CASES. 


X  (NUMBER  OF  EXCITATORY  CONNECTIONS) 


2 

3 

4 

5 

0 

.358 

.426 

.328 

.274 

1 

.365 

.502 

.436 

.363 

■y 

2 

.362 

.551 

.526 

.451 

(NUMBER  OF 

3 

.350 

.578 

.596 

.533 

INHIBITORY 

4 

.333 

.585 

.646 

.605 

CONNECTIONS) 

5 

.310 

.578 

.677 

.664 

6 

.285 

.558 

.690 

.707 

7 

.268 

.529 

.688 

.736 
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In  this  case,  the  two  most  similar  bars  to  any  test  bar  (those 
which  overlap  it  by  3/4  of  its  area  on  either  side)  are  invariably  in  the 
opposite  class.  Nonetheless,  all  stimuli  may  be  positive  stimuli  under 
these  conditions,  with  a  suitable  choice  of  parameters.  Table  4  shows  the 
ratio  for  a  100  unit  system  in  this  experiment.  Figure  14  shows  the 

performance  of  a  perceptron  with  the  same  parameters  as  before  (  Z  -  3  ,  y  =  /; 
Q  =  ?)  on  this  experiment,  and  also  with  the  best  parameters  found  to  date 
{y  ij  =  A  9^^).  These  parameters  are  the  best  set  for  X  -  5  and  q  4.  7  , 
but  are  probably  not  optimum,  as  it  seems  likely  that  a  further  increase  in 
both  »  and  </  would  yield  a  further  improvement  in  performance. 

7.1.4  Random  Sequence  Experiments:  Analysis 

For  the  analysis  of  the  performance  of  perceptrons  trained 
with  random  stimulus  sequences,  it  is  convenient  to  make  use  of  an 
unnormalized  G-matrix  (see  footnote,  page  75),  where  /  instead  of 

/  .  For  such  a  matrix,  in  the  -system,  '/•■  -■  the  number  of 

units  active  for  both  5;  and  ;  ,  or 

■y;,  Z 

The  mathematical  properties  of  the  unnormalized  G-matrix  are 
from  those  discovered  for  the  normalized  matrix,  in  Chapter  5 

In  a  random  sequence  experiment,  the  training  sequence  is 
assumed  to  consist  of  a  series  of  J  stimuli,  in  which  each  stimulus  in 
the  series  is  selected  independently  of  the  others.  The  probability  of 


(7.8) 

no  different 
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th 

selecting  stimulus  Sj  for  the  t  position  in  the  sequence  is  joj  , 

for  all  f  .  We  will  let  m  ■  =  the  number  of  times  stimulus  5  •  occurs 

j  J 

in  the  training  sequence.  The  random  vector  /r?  =  will  have 

a  multinomial  distribution  with  T  trials  and  probability  vector 
p  ^  ( p^,  ,  ■  ■ . ,  .  The  training  sequence  selected  is  assumed  to  be 

independent  of  the  particular  perceptron  selected  for  a  given  experiment. 
At  the  end  of  the  training  sequence,  the  input  to  the  R-unit  in  response  to 
a  test  stimulus  5^  will  be 


u 


-  Z  "'j  'J,; 


ZZ 

<  ,/ 

Therefore,  the  expected  value  over  perceptrons  and  training  sequences  is 
L(uy)  --  T  V.  T  p.  p- 

which  is  of  the  order  of  T .  Note  that  this  is  identical  to  equation  (7.3). 


given  by 


The  variance  over  both  perceptrons  and  training  sequences  is 

i  ^ 


L  22  L  (r,  1  E(q,^)E(g,i)_ 


(7.10) 
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For  the  components  of  the  rnultinomially  distributed  vector  m  we  have 

t  {  m  ;)  =  T  .p  • 

£(m/)  =  r(T-i)pf  ,  T  p. 

£  --  r(T-l)  pj  j>^ 

Let  n  ■  -  number  of  A -units  active  for  stimuli  5-  ,  5’  ,  •  •  •  ,  5y 

The  symbol  over  a  subscript  will  be  used  to  denote  negation  (e.g,, 
r)-'/  =  the  number  of  A-units  active  for  stimulus  5'  but  not  for  Zj  : 

J  s  i  -  * 

n  ■  "p  -  rij  -  nj ^  ).  From  eauation  7.8,  it  is  clear  that  for  the  ry  -system, 
ft- 1  -  cj'j  .  Now,  any  set  of  n's  which  is  exhaustive  (every  A-unit  counted 
in  at  least  one  n;-  ,  ),  and  such  that  each  A-unit  is  counted  in  no  more 

than  one  n  -  ■  ,  will  have  a  multinomial  distribution.  From  this  it 

follows  that 


JXJ  '  = 


o.  X  t 


^  I.  /?  •  .  n  ■  E  (  r  ■  ^  n  ^  )  >  £ (n  •  ].  n~ , 


=  A/,  V-  .  ^  /  N  I  A/  -  f)  \rj-  /  ("£. 


Jk  X  ' 


’’  '  !  f,  !  ^  X  ''j  X  >  '  ^  '  '  '  >  '  •  I  '  ^  C  I  I 


'£  ^  'J  ■//  'h. 
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Substituting  in  (7.10),  this  yields 


(N^-OO^,  ^  Ij 


"  L  L  p:,  p/^  (r-i}Qj4, -(^^^^a-OQj,  Q&, 


(7.11) 


The  variance  of  is  therefore  on  the  order  of  T -h  t  ,  at 

maximum.  Since  the  square  of  the  mean  is  on  the  order  of  T'’'  ,  the 
2  2 

ratio  u  o'  becomes  indefinitely  large  as  and  T  both  increase, 

and  the  Theorem  stated  in  Section  7.1.2  is  seen  to  hold  for  random  training 
sequences  of  sufficient  length,  as  well  as  fixed  sequences.  As  the  length  of 
the  training  sequence,  /  ,  increases,  the  relative  frequencies  m-  /t  will 

approach  the  probabilities  /,  ,  and  the  performance  of  the  system  will 

approach  the  performance  in  a  fixed  sequence  experiment.  As  N,^  goes  to 
infinity,  the  ratio  -■/  '  ^  '  approaches 


7.1.5  Random.  Sequence  Experiments:  Examples 

As  a  "calibration  experiment"  for  comparing  different 
systems,  the  horizontal  vs.  vertical  bar  discrimination  problem  is  parti¬ 
cularly  convenient.  The  random  sequence  version  of  the  experiment  is  as 
follows : 
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EXPERIMENT  3:  For  the  same  conditions  and  classification  as  Experi¬ 
ment  1,  show  the  perceptron  a  random,  sequence  of  horizontal  and  vertical 
bars,  each  bar  occurring  with  equal  frequency  (  p-  =  i / ‘10  for  all  bars). 
During  this  training  sequence,  S-controlled  reinforcement  is  used,  and  the 
performance  of  the  perceptron  for  an  arbitrary  bar,  S  ,,  ,  is  then  deter¬ 

mined  as  before . 

Figure  15  shows  the  performance  of  binomial  model  a  -perceptrons  of 
three  different  sizes  on  this  problem,  as  a  function  of  the  length  of  the 
training  sequence  (  T  ).  The  parameters  <  .  7  >  and  0  are  the  optimum 

values  (3,  1,  2.)  found  in  Section  7 .  1  .  3  .  Further  increases  in  will  not 

appreciably  improve  performance  in  this  experiment. 

The  effect  of  a  "frequency  bias"  on  ^  -system  perceptrons 
is  illustrated  in  the  following  experiment: 

EXPERIMENT  4:  The  conditions  and  classifications  are  the  same  as  in 
Experiment  3,  but  the  horizontal  bars  occur  four  times  as  frequently  as 
the  vertical  bar  s ;  i .  e  .  ,  p-  V  for  horizontal  ba  rs  and  .  '  for  vertical 

bars . 

Figure  16  shows  the  performance  of  a  100  A-unit  system  on  this  experiment. 
The  upper  curve  shows  the  probability  of  correctly  identifying  a  horizontal 
bar,  and  the  lower  curve  shows  the  probability  of  correctly  identifying  a 
vertical  bar.  The  correct  response  to  vertical  bars  is  actually  suppressed 
as  training  increases,  due  to  the  greater  frequency  of  horizontal  bars.  The 


Figure  15  PROBABILITY  OF  CORRECT  I NOENTI FI CATI ON  OF  TEST  STIMULUS  BY  BINOMIAL 
o^-PERCEPTRONS  IN  EXPT.  3  (RANDOM  SEQUENCES) 

{  X  =  3,  y=  I,  6=2) 


Figure  16  PROBABILITY  OF  CORRECT  IDENTIFICATION  OF  TEST  STIMULI  IN  EXPT.  4. 
BINOMIAL  a-PERCEPTRON  WITH  <00.  X  =  3,  ^  0  =  2. 

P-  =  .04  FOR  HORIZONTAL  BARS;  .01  FOR  VERTICAL  BARS 
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broken  curve  shows  the  mean  performance  on  both  classes,  with  test 
stimuli  drawn  from  each  class  with  their  appropriate  frequencies.  In  the 
following  chapter,  it  will  be  seen  that  this  performance  can  be  considerably- 
improved  in  a  -system  perceptron.  It  would  also  be  improved  for  an 
ry  -perceptron  if  error  correction  training  were  employed  instead  of 
S-controlled  reinforcement. 

7.2  Discrimination  Experiments  with  Error  Correction  Procedures 


The  analysis  and  experiments  in  the  preceding  section  deal  with 
S-controlled  reinforcement  experiments.  In  Chapter  5,  Theorem  6,  it  was 
shown  that  this  procedure  cannot  be  guaranteed  to  yield  a  solution  to  a 
classification  problem,  even  though  a  solution  may  exist,  whereas  an  error 
correction  procedure  will  always  yield  a  solution  if  any  solutions  exist.  The 
error  correction  procedure  would  therefore  seem  to  be  the  method  of  choice 
in  training  a  perceptron  to  discriminate  between  two  classes  of  stimuli. 
Unfortunately,  the  type  of  analysis  which  was  carried  out  for  S-controlled 
experiments  is  not  readily  performed  with  error -cor rection  experiments. 
Consequently,  all  data  on  learning  curves  for  error  correction  procedures 
come  from  one  of  two  sources:  simulation  on  a  digital  computer  ,  and 
performance  of  actual  experiments  on  the  Mark  I  perceptron  at  the  Cornell 
Aeronautical  Laboratory  (Refs.  29,  30,  31). 


Experiment.s  performed  by  Carl  Kesler  on  the  Burroughs  220  computer 
at  Cornell  University. 
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Two  main  sets  of  experiments  will  be  described  here,  the  first 
with  binomial  model  ./  -perceptrons ,  and  the  second  with  perceptrons 
having  additional  constraints  imposed  on  their  S  to  A-unit  connections. 

7.2.1  Experiments  with  Binomial  Models 


The  following  four  experiments  have  been  performed  with 
binomial  model  perceptrons  (having  fixed  numbers  of  sensory  connections 
to  each  A-unit,  with  origins  located  at  random  in  the  sensory  mosaic): 

EXPERIMENT  5:  The  environment  of  horizontal  and  vertical  bars  used 
in  Experiment  1  is  employed,  and  the  stimuli  occur  in  fixed  sequence,  first 
showing  all  horizontal  bars  in  fixed  sequence,  then  all  vertical  bars,  and 
repeating  the  sequence  until  perfect  performance  is  achieved.  The  error 
correction  procedure  is  employed,  and  the  performance  is  tested  at  the 
end  of  each  sequence. 

EXPERIMENT  6:  The  same  environment  and  training  procedure  is 
employed  as  above,  but  the  stimuli  occur  in  a  random  sequence,  with 
for  each  stimulus  (as  in  Experiment  3). 

EXPERIMENT  7:  The  environment  consists  of  a  set  of  triangles  in  all 
possible  positions  on  a  toroidally  connected  20  by  20  retina,  and  a  set  of 
squares  in  all  possible  positions  on  the  retina.  The  triangles  and  squares 
each  cover  80  of  the  400  retinal  points.  The  sequence  is  random,  as  in 
Experiment  6,  with  -  ■'  for  each  stimulus.  (The  set  of  possible 

stimuli  is  generated  by  translations  of  a  standard  image;  rotations  are  not 
permitted.  ) 
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FIXED  TRAINING  SEQUENCE 


Figure  17  PERFORMANCE  OF  BINOMIAL  0/-PERCEPTRONS  IN  EXPERIMENTS  5  AND  6 
(HORIZONTAL  /  VERTICAL  BAR  DISCRIMINATION  WITH  ERROR  CORRECTION 
PROCEDURE).  SOLID  CURVES  SHOW  MEAN  PERFORMANCE  OF  25  PERCEPTRONS, 
WITH  300,  =  3,  y  =  I,  9^1 


THEORETICAL  CURVE 
FOR  S-CONTROLLEO 
RANDOM  SEQUENCE* 
EXPERIMENT  i 


40  80 

NO.  OF  TRAINING  STIMULI  (T) 


Figure  18  PERFORMANCE  OF  BINOMIAL  or -PERCEPTRONS  IN  SQUARE  /  TRIANGLE  DISCRIMINATION 
(EXPT.  7)  COMPARED  WITH  HORIZONTAL  /  VERTICAL  BAR  DISCRIMINATION  (EXPT.  6) 
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EXPERIMENT  8:  The  horizontal/vertical  bar  environment  is  employed,  as 
in  Experiment  6,  with  stimuli  occurring  in  random  sequence.  A  random 
sign  correction  procedure  is  employed  for  training  the  perceptron  (see 
Definition,  Section  5.6). 

Figure  17  shows  the  results  of  Experiments  5  and  6,  and  includes 
a  theoretical  learning  curve  for  an  S-controlled  experiment  for  comparison. 
The  experimental  curves  show  the  mean  performance  for  a  set  of  25  binomial 
perceptrons  with  300  A -units,  and  the  optimum  parameters  (  ,  y  - 

.'9  =  .2  )  found  in  the  preceding  section.  The  same  25  perceptrons  were 

employed  in  Experiments  5  and  6.  It  appears  to  be  characteristic  that  a 
random  training  sequence  leads  to  a  more  rapid  learning  rate  initially,  but 
is  overtaken  by  the  fixed  sequence  performance  as  the  duration  of  training 
increases.  Note  that  in  both  cases,  the  error  correction  method  yields 
considerably  better  performance  than  the  S-controlled  method. 

Figure  18  shows  the  mean  performance  of  a  set  of  15  perceptrons 
on  Experiment  7.  The  parameters  are  ;  ■  ,  '  y  , 

h'  =  3  .  These  were  the  best  parameters  tested,  but  are  probably  not 

optimum.  The  learning  curve  for  the  horizontal/vertical  bar  experiment 
(Experiment  6)  is  shown  as  a  broken  line  for  comparison.  The  slow  learning 
rate  in  this  experiment  is  largely  due  to  the  large  number  of  distinct  stimuli 
in  the  environment  (800)  compared  to  the  number  in  the  horizontal / ve rtical 
bar  environment  (40).  The  increased  number  of  stimuli  means  that  a  much 
longer  training  sequence  is  required  to  guarantee  a  representative  sample 
of  all  stimuli,  with  a  reasonably  uniform  coverage  of  the  retinal  field.  A 
further  difficulty  is  introduced  by  the  fact  that  the  maximum  overlap  of  a 
square  and  triangle  is  much  greater  than  the  maximum  overlap  of  a  horizontal 
and  vertical  bar,  making  the  discrimination  intrinsically  more  difficult. 


-176- 


Figure  19  shows  a  comparison  of  the  performance  of  10 
perceptrons  on  Experiment  8  with  the  performance  of  the  same  10  perceptrons 
on  Experiment  6.  In  Experiment  8,  the  learning  is  not  only  much  slower,  but 
the  variability  between  perceptrons  is  greatly  increased.  Of  the  ten  per¬ 
ceptrons  tested,  two  achieved  perfect  performance  during  the  period  of  the 
experiment^  which  was  discontinued  after  2000  training  stimuli.  Nonetheless, 
each  of  the  ten  perceptrons  would  ultimately  achieve  perfect  performance  if 
the  experiment  were  continued  (due  to  Theorem  5,  Section  5.6).  Vv^ith  the 
directed  error  correction  procedure,  all  ten  perceptrons  achieved  perfect 
performance  within  300  training  stimuli. 


While  the  performance  of  an  elementary  perceptron  with  the 
random  sign  procedure  is  clearly  unsatisfactory  for  practical  systems,  it 
should  be  noted  that  the  existence  of  a  consistent  bias  in  the  proper  direction 
still  makes  this  a  plausible  component  of  a  more  reliable  mechanism.  If  a 
"majority  mechanism"  is  employed  (e.g.,  a  threshold  device  which  responds 
to  the  difference  of  positive  and  negative  signals  from  R-units) 
to  determine  the  "majority  vote"  of  n  such  elementary  perceptrons, 
connected  independently  to  the  same  retina,  a  highly  reliable  system  would 
result.  The  error  probability  of  this  system  would  be: 


I 

< 


/  / 


/  -  / 


when  P  is  the  probability  of  correct  response  for  a  single  perceptron 
(as  shown  in  Figure  19). 
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Figure  19  COMPARISON  OF  RANDOM-SIGN  CORRECTION  PROCEDURE  (EXPT.  8)  WITH  "NORMAl 
ERROR-CORRECTION  PROCEDURE  (EXPT.  6).  SHOWS  MEANS  OF  10  BINOMIAL 
cx'-PERCEPTRONS  WITH  =  300,  x  =  3,  u  =  \  .  0=2 


While  the  actual  learning  curve  for  error  correction  experiments 
cannot  at  present  be  stated  analytically,  R.  D.  Joseph  has  obtained  an  upper 
bound  for  the  number  of  corrective  reinforcements  that  must  be  applied, 
where  a  solution  exists.  In  the  proof  of  Theorem  4,  Chapter  5,  it  was  noted 
that  an  upper  bound  for  the  number  of  corrective  reinforcements  can  be 
expressed  in  terms  of  the  quantity  ,  as  follows; 


(7.12) 


where 


maximum  diagonal  element  of  the  G-matrix, 

minimum  of  the  function  F'  --  r  //b  ||  / ']  (as  defined  for 

Theorem  4,  Chapter  5). 

ij  '  (as  in  Theorem  4,  Chapter  5). 


For  the  case  which  is  of  primary  interest  here,  the  process 
starts  from  the  origin,  so  that  <  ^  In  this  case,  (7.12) 

simplifies  to 


7.2.2  Experiments  with  Constrained  Sensory  Connections 


In  all  perceptrons  considered  thus  far,  connections  from  S-units 
to  A-units  have  had  their  origins  randomly  chosen  from  the  set  of  all  sensory 
points,  with  equal  probability.  Such  models  will  be  called  uniform  input 
distribution  models  (u.i.d.  models).  It  has  occasionally  been  proposed  that 
the  performance  of  a  perceptron  might  be  considerably  improved  by  the 


-179- 


introduction  of  special  constraints  on  the  admissible  origin  point  connections. 
For  example,  the  retinal  connections  could  be  made  to  resemble  biological 
systems  more  closely  by  assigning  a  "retinal  field"  to  each  A-unit,  and 
limiting  its  choice  of  origin  points  to  S-units  within  this  field.  A  similar 
procedure  would  be  to  construct  a  network  of  connections  by  assigning  a 
center  at  random  to  each  A-unit,  somewhere  on  the  retina,  and  selecting 
connections  from  a  circular  normal  distribution  about  this  center.  Such 
systems  will  be  called  normal  input  distribution  models  (n.i.d.  models). 
Further  constraints  might  lead  ultimately  to  specialized  A-units,  whose 
input  configurations  are  specially  designed  to  make  them  responsive  to 
stimuli  of  particular  shapes,  or  configuration  properties.  We  will  consider 
one  further  constraint  in  this  section:  the  case  in  which  the  excitatory  and 
inhibitory  connections  to  an  A-unit  are  assigned  distinct  centers  on  the 
retina,  with  origins  selected  from  a  circular  normal  distribution  about 
these  centers.  This  will  be  called  the  divided  input  distribution  (d.i.d.) 
model.  The  n.i.d.  model  can  be  considered  a  special  case  of  the  d.i.d. 
model  in  which  the  excitatory  and  inhibitory  centers  and  dispersions  are 
identical . 

In  the  general  d.i.d.  model,  A-units  are  characterized  by 
seven  parameters:  /  ,  n  and  as  before,  the  expected  distance 

between  excitatory  and  inhibitory  centers  {LO),  the  standard  deviation 
of  this  distance  (  r/ /./  ),  and  the  standard  deviations  of  the  normal  proba¬ 
bility  distributions  about  the  excitatory  and  inhibitor/  centers  (n-r  and  ). 

A  number  of  experiments  have  been  performed  with  such  models  in  an 
attempt  to  discover  what  sort  of  improvement  might  be  achieved  by  an 
optimum  set  of  constraints  on  the  sensory  connections. 
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Experiments  6  and  7  have  been  used  for  the  study  of  constrained 
input  distributions.  In  the  square /triangle  discrimination  experiment 
(Experiment  7)  the  performance  of  the  d.i.d.  models  never  showed  any 
improvement  over  the  original  u.i.d.  model.  A  large  number  of  combi¬ 
nations  of  X  ,  (/  ,  and  ‘)  were  tested  with  various  distribution  para¬ 
meters,  in  an  attempt  to  find  the  optimum  system  for  ^  ^  10 

The  best  performance  was  obtained  for  a  set  of  15  perceptrons  with  X,  -  , 

/  ~  ,  t-’  -  s  .  iD  -  ,  o  D  O  ,  o'y  ■/  ,  and  ^  7 

This  is  equivalent  to  an  n.i.d.  model  with  the  same  centers  for  excitatory 
and  inhibitory  distributions,  and  O  -  7  .  The  performance  of  this  system 

did  not  differ  from  that  of  the  equivalent  u.i.d.  model  by  more  than  1%  at 
any  point  on  the  learning  curve,  and  was  within  1/4%  of  the  u.i.d.  performance 
at  most  of  the  points  tested.  The  same  stimulus  sequences  were  used  for 
both  models  in  order  to  make  conditions  as  closely  comparable  as  possible. 
These  results  suggest  that  for  large  but  spatially  concentrated  stimulus 
patterns,  little  advantage  is  to  be  gained  in  an  elementary  perceptron  by 
imposing  radial  constraints  on  the  origin  point  configurations. 

In  the  case  of  the  horizontal /vertical  bar  discrimination 
(Experiment  6)  a  slight  advantage  was  found  for  the  d.i.d.  model  for  the 
parameters  ''  :  ,  /  f  ,  <  ,  o  L  7  ,  ■x'x  -  T  ,  o'y  =  V. 

On  the  basis  of  a  number  of  simulation  experiments,  this  appears  to  be 
close  to  an  optimum  configuration  for  tlie  d.i.d.  model  for  this  experi¬ 
ment.  Figure  ZO  shows  the  results  obtained  from  Z5  runs  with  these 
parameters,  compared  with  Z5  u.i.d.  models  with  optimum  parameters 
'  '  .  F  -  ■'  using  the  identical  training  sequences.  The 

difference,  although  slight,  appears  to  be  statistically  significant. 
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Figure  20  COMPARISON  OF  OPTIMUM  d.i.d.  KHOu.i.c/.  MODELS  IN  HORIZONTAL  /  VERTICAL 
BAR  DISCRIMINATION  (EXPT.  6).  CURVES  SHOW  MEANS  OF  25  RUNS 
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The  general  conclusion  from  these  experiments  seems  to  be 
that  (for  large  stimuli)  little  is  to  be  gained  from  special  constraints  which 
affect  only  the  dispersion,  rather  than  the  geometric  form,  of  origin  point 
patterns  in  elementary  perceptrons.  A  further  variation  of  the  model,  in 
which  elliptical  rather  than  circular  distributions  of  origin  points  are  employed 
might  be  more  sensitive  to  contours  and  directions  of  elongation  in  the  stimuli. 
No  quantitative  results  are  available  on  such  a  model  at  this  time. 

7.3  Discrimination  Experiments  with  R-controlled  Reinforcement 


In  an  experimental  system  with  R-controlled  reinforcement 
(Definition  39)  the  reinforcement  control  system  receives  information  about 
the  outputs  of  the  perceptron,  but  receives  no  information  directly  from  the 
environment.  Such  experiments  are  of  interest  in  determining  the  "spon¬ 
taneous  organization"  tendencies  of  perceptrons.  It  is  readily  seen,  from 
theoretical  coi'*siderations,  that  the  performance  of  an  elementary  (X  - 
perceptron  in  such  experiments  is  unlikely  to  be  of  psychological  interest. 

In  an  fx  -perceptron,  all  y- j  are  generally  greater  than  zero,  so  that 
whatever  response  is  associated  to  the  first  stimulus  in  a  training  sequence 
will  tend  to  generalize  to  all  other  stimuli  in  the  environment.  Conse¬ 
quently,  the  perceptron,  left  to  its  own  devices  without  any  attempt  to 
change  its  responses,  will  tend  to  form  a  classification  C  (\V)  in  which 
all  stimuli  in  IV  are  either  in  the  positive  class  or  else  all  in  the  negative 
class,  with  equal  probability. 


See  Section  Z3  .  1  .  2  for  a  reconsideration  of  this  problem  from  the 
standpoint  of  sensory  analyzing  mechanisms. 

-''”'‘In  Ref.  82,  such  systems  have  been  called  "Class  C  perceptrons". 
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Two  special  cases  are  of  interest,  in  which  it  is  possible  for 
a  dichotomy  to  be  formed  with  both  classes  non-empty.  In  the  first  case, 
some  of  the  g-;  coefficients  are  zero.  This  might  occur  in  a  system 
with  high  thresholds  on  the  A-units,  so  that  some  pairs  of  stimuli  activate 
no  A-units  in  common,  If  and  S’  are  two  such  stimuli,  then  if 

is  the  first  stimulus  and  5;  is  the  second  stimulus  in  the  training 
sequence,  it  is  perfectly  possible  that  one  will  become  associated  to  a 
positive  response,  and  the  other  to  a  negative  response.  If  these  are  the 
only  two  stimuli,  or  if  there  is  no  positive  generalization  from  any  of  the 
stimuli  which  become  associated  to  one  class  to  the  stimuli  of  the  second 
class,  this  dichotomy  may  be  stable.  In  general,  however,  one  class  is 
apt  to  become  dominant,  eventually  pulling  all  stimuli  into  a  single  class 
as  before.  The  second  case  in  which  a  dichotomy  might  be  formed  is  that 
in  which  the  values  are  not  initially  all  zero,  but  are  distributed  with  some 
connections  negative  and  some  positive.  In  this  case,  the  generalization 
from  the  first  stimulus  will  not  necessarily  wipe  out  an  initial  bias  in  the 
opposite  direction,  and  it  is  possible  that  a  dichotomy  will  be  formed. 

While  it  is  possible  for  dichotomies  to  be  formed  in  the  special 
cases  mentioned  above,  there  is  little  reason  to  suppose  that  such  dicho¬ 
tomies  would  ever  be  of  interest  to  a  human  observer.  If  the  stimuli  are 
uniformly  distributed  on  the  retina,  or  uniformly  clustered  about  the 
center  of  the  field,  the  g-j  coefficients  which  happen  to  be  zero  will 
generally  be  .unrelated  to  possible  "meaningful''  classifications  of  the 
stimuli,  so  that  any  division  into  two  classes  will  tend  to  be  random, 
and  unrelated  to  any  concept  of  "intrinsic  similarity"  of  the  stimuli.  Thus 
it  is  clear  that  in  an  elementary  ry  -perceptron,  psychologically  meaning - 
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ful  discriminations  can  be  achieved  only  under  the  control  of  an  experi¬ 
menter,  or  r.c.s.  which  is  capable  of  evaluating  the  correctness  of  the 
perceptron's  responses  according  to  some  predetermined  scheme.  In  the 
7’  -systems,  which  are  considered  in  the  following  chapter,  somewhat 
more  interesting  performances.inR-controlled  experiments  are  likely  to 
occur . 

7.4  Detection  Experiments 

In  discrimination  experiments,  such  as  those  considered  in 
the  previous  sections,  the  perceptron  is  required  to  give  one  of  two  responses 
to  designate  which  of  two  well-defined  classes  of  patterns  is  present.  It  is 
assumed  that  one  of  the  two  is  always  present,  and  that  nothing  else  is 
present  which  might  confuse  the  picture.  In  detection  experiments,  a 
single  pattern,  or  class  of  patterns,  is  taught  the  perceptron  as  the  "positive 
class",  and  anything  else  (such  as  noisy  fields,  arbitrary  patterns,  etc.  )  is 
considered  to  belong  to  the  "negative  class".  Moreover,  the  positive  pattern 
may  appear  with  an  admixture  of  background  noise,  irrelevant  lines,  or 
other  sensory  material.  While  such  detection  experiments  differ  considerably 
in  their  "psychological"  character  from  discrimination  experiments,  from  a 
theoretical  standpoint  they  represent  a  special  case  of  discrimination  experi¬ 
ments  in  which  the  training  and  the  two  classes  of  stimuli  are  highly  asymme¬ 
tric,  the  positive  class  generally  being  smaller  but  more  thoroughly  trained 
than  the  negative  class.  Two  cases  are  of  interest:  detection  in  noisy 
environments,  and  detection  in  organized  environments.  These  are 
considered  separately  in  the  following  sections. 
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7.4.1  Detection  in  Noisy  Environments 


A  noisy  environment  will  be  defined  as  the  product  set  of  a 
set  of  well-defined  stimulus  patterns  (including  an  empty  field  as  a  stimulus) 
and  a  set  of  "random  noise  patterns"  superimposed  on  the  members  of  the 
first  set.  The  random  noise  patterns  are  generated  by  applying  signals  of 
random  polarity  (positive  or  negative  with  .5  probability)  to  a  randomly 
selected  set  of  S -units,  chosen  independently  with  probability  A,  .  will 

be  called  the  noise  density  of  the  environment,  and  represents  the  expected 
value  of  the  proportion  of  S-points  which  emit  random  signals  at  any  given 
moment  of  time  . 

Note  that  a  noisy  environment  is,  in  its  entirety,  a  well  defined 
set  of  stimuli,  with  a  probability  /  •  associated  with  each  stimulus  ,  , 
Such  an  environment  consists  of  two  classes:  a  positive  class,  in  which  one 
of  the  "positive  stimuli"  (e.g.,  a  geometric  form)  is  present  in  combination 
with  one  of  the  noise  patterns,  and  a  negative  class,  consisting  of  the  noise 
patterns  alone,  or  the  "empty  field"  stimulus  with  a  noise  pattern  super¬ 
imposed.  The  task  of  the  perceptron  is  to  distinguish  between  positive  and 
negative  stimuli. 

Let  ,  represent  a  test  stimulus,  selected  from  the  positive 
class.  Then  the  probability  of  correctly  identifying  ,  as  a  positive 
stimulus  in  a  random  sequence  e.^periment,  with  S -controlled  reinforce¬ 
ment,  is  given  by  equation  (7.7),  with  rin  >  defined  by  equation  (7.9) 
and  '  .  ,  defined  by  equation  (7 .  1  1 ),  just  as  in  an  ordinary  discri- 
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mination  experiment.  Similary,  if  S^,  is  a  noise -stimulus ,  from  the 
negative  class,  the  probability  of  obtaining  the  correct  (negative)  response 
is  given  by  the  complement  of  the  probability  obtained  from  equation  (7.7). 
Some  special  analytic  features  of  this  problem  are  worth  noting. 


For  a  binomial  model,  with  a  large  retina  and  large  association 
system  (so  that  all  0  -functions  and  retinal  intersections  of  noise  patterns 
can  be  assumed  equal  to  their  expected  value)  the  intersection  of  a  noise 
pattern  with  any  other  stimulus  will  be  equal  to  the  expected  value  of  this 

3j< 

intersection.  If  we  designate  the  noise  patterns  by  5„  r  1^,,'’  •••> 
and  positive  stimuli  by  .  then  (as  explained  on  page  146), 


Let  and  .  '  represent  the  same  positive  stimulus  pattern  with 

different  noise  patterns  superimposed.  Then,  if  the  noise  density  is 
low,  ...  ■  .  .  But  .,  •>  .  Therefore, 

0 ,  /  >  -•  'i  ,  which  means  that  the  perceptron  can  be  taught  quite 

readily  to  give  the  proper  positive  response  to  a  test  stimulus,  y 


Actually,  as  noise  patterns  have  been  defined,  the  intersection  of  a 
pure  noise  pattern  with  a  positive  stimulus  pattern  will  be  slightly 
less  than  the  expected  value,  since  some  of  the  points  which  normally 
are  "on"  for  the  positive  stimulus  will  be  turned  "off"  for  the  noise 
pattern.  The  conclusions  above  hold  rigorously  if  the  noise  patterns 
are  sets  of  positive  signals  only. 
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The  same  conclusion  does  not  hold  fox*  the  identification  of  a  negative 
(noise)  stimulus,  however.  In  this  case,  the  generalization  from  a  previously 
trained  noise  stimulus,  to  ,,  is  equal  to  ssuming 

all  noise  stimuli  to  be  equal  in  area  to  their  expected  value).  But  the 
generalization  from  a  positive  stimulus  is  "  ,,,  ^  .b,  ^  which  is  generally 

greater  than  ,  since  the  area  covered  by  the  positive  stimulus  with 

noise  superimposed  is  generally  greater  than  the  area  of  the  noise  stimulus 
alone.  Consequently,  we  would  expect  the  positive  response  to  tend  to 
generalize  to  the  negative  class  as  .veil,  if  both  classes  are  represented 
witli  equal  frequency  in  the  training  sequence. 

A  slight  modification  of  the  perceptron  should  improve  its 
capability  of  distinguishing  negative  stimuli  from  positive  ones.  If  the 
R-unit  is  given  a  threshold  greater  than  zero,  it  will  tend  to  remain  "off" 
for  the  relatively  weak  signals  coming  from  noise  stimuli,  but  will  go  "on" 

(to  its  positive  state)  for  the  stronger  signals  coming  from  positive  stimuli. 
With  this  modification,  however,  the  system  is  no  longer  an  elementary 
perceptron.  An  alternative  procedure,  which  will  improve  the  performance 
of  an  elementary  perceptron,  is  to  "overtrain"  the  negative  stimvili, 
composing  a  stimulus  sequence  in  which  negative  stimuli  occur  more 
frequently  than  positive  ones.  In  an  error  correction  experiment,  it 
should  be  noted,  this  bias  will  be  introduced  automatically,  regardless  of 
the  stimulus  sequence,  so  that  a  detection  problem  should  be  solved  much 
more  readily  than  with  an  S-controlled  system. 
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7.4.2  Detection  in  Organized  Environments 

In  an  "organized  environment",  where  the  background  material 
may  closely  resemble  the  stimulus  pattern  in  its  characteristics,  detection 
experiments  take  on  some  characteristics  of  special  interest,  psychologi¬ 
cally.  First  of  all,  it  should  be  noted  that  in  attempting  to  distinguish  a 
pattern  such  as  the  letter  "X"  against  a  background  of  lines  occurring  in 
random  configurations,  the  environment  may  include  stimuli  which  are 
fundamentally  ambiguous  in  character,  since  patterns  closely  resembling 
the  letter  "X",  or  even  identical  to  it,  might  arise  by  a  chance  super¬ 
imposition  of  straight  lines.  In  such  a  case,  the  only  reasonable  test  of 
whether  or  not  a  pattern  should  be  identified  as  an  "X"  would  seem  to  be 
the  human  criterion  of  whether  it  looks  more  like  an  X  or  more  like  a 
random  assemblage  of  line  segments.  While  a  similar  problem  might 
arise,  in  principle,  in  the  case  of  detection  experiments  in  noisy  fields,  it 
is  less  common  there,  except  under  extreme  noise  conditions.  In  the  case 
of  organized  fields,  ambiguous  organizations  are  more  the  rule  of  the  day, 
and  the  problem  requires  a  different  approach.  In  human  perception,  the 
properties  of  "good  figure"are  gaierally  used  to  determine  whether  a 
particular  set  of  line  segments  is  seen  as  a  letter,  or  some  other  known 
pattern,  or  simply  as  a  random  collection  of  unrelated  components.  Such 
judgements  are  not  possible,  however,  for  elementary  perceptrons.  We 
will  return  to  the  problem  of  figural  organization  in  Part  IV. 

Treating  the  detection  experiment  simply  as  a  special  case  of 
a  discrimination  experiment,  the  same  conclusions  apply  as  in  the  case 
of  the  noisy  environment  problem:  it  is  possible,  by  exhaustively  training 


-189- 


I 


the  perceptron  with  the  product  set  of  positive  stimuli  and  irrelevant 
patterns  to  teach  it  to  identify  positive  stimuli  amidst  extraneous  material. 

The  learning  is  apt  to  be  slow,  however,  and  will  generally  fall  considerably 
short  of  what  might  be  expected  in  a  simpler  discrimination  experiment. 

Most  of  the  experimental  work  done  to  date  on  detection 
experiments  has  been  carried  out  with  the  Mark  1  perceptron  using  a  gamma 
system  for  the  memory  dynamics.  This  work  will  be  reviewed  in  the  follow¬ 
ing  chapter,  which  deals  with  '  -perceptrons ,  but  similar  results  might 
be  expected  with  alpha  systems. 

7.5  Generalization  Experiments 

In  the  preceding  experiments,  it  has  been  required  that 
should  necessarily  occur  as  one  of  the  stimuli  in  the  training  sequence. 

When  the  perceptron  is  tested  with  a  stimulus  which  has  not  been  previously 
seen,  a  weak  form  of  generalization  is  possible  with  elementary  '/  -systems. 
Clearly,  if  the  intersection  of  5,  with  some  other  stimulus  in  the  same  class, 

'"T'  ,  which  did  occur  in  the  training  sequence,  is  large  enough,  5.^  will 

tend  to  evoke  the  same  response  as  ■  .  In  this  case,  is  correctly 

recognized  only  because,  within  the  limits  of  tolerance  of  the  perceptron, 
it  appears  to  be  identical,  rather  than  merely  similar  to,  the  previously 
seen  training  stimulus.  Thus,  generalization,  for  an  elementary  -perceptron, 
is  based  on  an  approximation  to  identity,  rather  than  on  similarity.  In  a 
''pure  generalization"  experiment,  as  defined  in  Chapter  3,  the  perceptron 
would  be  asked  to  recognize  a  pattern  in  a  position  where  it  does  not 
overlap  any  previously  seen  patterns  of  the  same  class.  If  such  an 
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experiment  is  performed  with  an  o/  -system,  with  a  single  class  of 
stimuli,  the  generalization  will  tend  to  be  positive,  due  to  the  fact  that  Q-^j 
is  never  zero,  for  most  systems,  regardless  of  the  relative  positions  of 
the  stimuli.  This  result  is  trivial,  however,  and  of  no  psychological  interest, 
since  any  stimulus,  whether  it  resembles  the  trained  stimuli  or  not,  will  also 
tend  to  evoke  the  same  response.  To  prevent  such  a  tribial  result,  it  is 
necessary  to  employ  a  discrimination  test,  training  the  system  with  two 
kinds  of  stimuli,  and  then  testing  it  with  similar  stimuli  in  a  disjoint  portion 
of  the  retina  to  find  out  whether  the  appropriate  responses  have  generalized 
for  both  kinds  of  stimuli.  In  this  case,  if  the  stimuli  are  of  equal  area,  and 
equally  trained,  no  generalization  will  be  found,  since  the  positive  generali¬ 
zation  from  one  class  is  exactly  balanced  by  the  negative  generalization 
from  the  other  class.  Thus  it  is  clear  that  an  elementary  iV -system  (and, 
in  fact,  any  elementary  perceptron)  is  incapable  of  abstracting  similarity 
(in  either  the  geometric  or  the  psychological  sense)  but  discriminates  only 
by  measuring  a  function  of  the  overlaps  of  a  test  stimulus  with  representatives 
of  both  classes  . 

7.6  Summary  of  Capabilities  of  Elementary  c/  -perceptrons 


The  elementary  -perceptrons,  being  the  simplest  class 
of  perceptrons,  provide  a  baseline  of  performance  against  which  other 
systems  can  be  compared.  It  has  been  demonstrated  that  the  c/  -system, 
with  both  S-controlled  and  error  correction  reinforcement,  is  capable  of 
discrimination  learning,  provided  it  sees  a  large  representative  sample  of 
the  stimuli  which  it  is  required  to  discriminate.  It  does  not  generalize 
well,  to  similar  forms  occurring  in  new  positions  in  the  retinal  field,  and 
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its  performance  in  detection  experiments,  where  a  familiar  figure  appears 
against  an  unfamiliar  background,  is  apt  to  be  weak.  More  sophisticated 
psychological  capabilities,  which  depend  on  the  recognition  of  topological 
properties  of  the  stimulus  field,  or  on  abstract  relations  between  the 
components  of  a  complex  image,  are  lacking.  The  elementary  perception 
has  no  capability  of  recognizing  time  sequences,  since  its  responses  are 
based  on  the  momentary  state  of  the  system  due  to  the  current  stimulus 
pattern  alone,  and  are  not  influenced  by  the  preceding  sequence  of  events. 
Quantitative  judgement  might  possibly  be  learned  by  an  exhaustive  training 
procedure,  in  which  the  system  is  required  to  give  one  response  for 
stimuli  above  a  certain  area,  or  over  a  certain  length,  for  example,  and 
an  opposite  response  if  they  fall  short  of  the  criterion.  This  is  a  rather 
crude  approximation  to  quantitative  estimation,  however,  and  the  problem 
can  be  handled  much  more  satisfactorily  with  perceptrons  with  linearly 
responding  R-units,  as  will  be  seen  in  Chapter  10.  In  R-controlled 
experiments,  w'here  the  perceptron  is  required  to  form  its  own  classification 
of  stimuli,  we  have  seen  that  the  elementary  -perceptron  tends  either 

to  classify  everything  identically  (its  most  general  tendency)  or  else  to 
form  a  random  dichotomy,  which  is  of  no  psychological  interest.  It  will 
be  found  that  most  of  the  weaknesses  of  elementary  <.y  -perceptrons  are 
true  of  all  simple  perceptrons,  and  that  it  is  necessary  to  go  to  topologically 
more  complicated  systems  to  find  performances  which  are  basically  more 
satisfactory.  In  special  cases,  however,  other  types  of  simple  perceptrons 
have  advantages,  as  will  be  seen  in  the  following  chapters. 
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7.7  Functionally  Equivalent  Systems 


It  may  be  disturbing  to  some  biologically  oriented  readers  to 
think  of  an  association  unit  that  changes  the  sign  of  its  output  signal  from 
excitatory  to  inhibitory  as  a  function  of  its  training.  This  is  a  conceptual 
simplification  which  makes  analysis  easier,  but  can  be  shown  to  be  logically 
equivalent  to  an  alternative  model  in  which  particular  neurons,  or  A-units, 
are  designated  as  excitatory,  and  others  as  inhibitory,  with  no  change 
permitted  in  the  sign  of  their  outputs.  The  alternative  model  (which  is 
analogous  to  the  models  originally  presented  in  Refs.  79  and  80)  is  as 
follows : 


Let  the  number  of  A-units  be  twice  the  number  in  the  equivalent 
.c  -perceptron.  Let  half  of  the  A-units  be  designated  as  excitatory  units, 
and  the  other  half  be  inhibitory  units.  All  , are  initially  assumed  to  be 
zero,  or  else  to  have  positive  signs  if  q-  is  excitatory,  negative  signs  if 
o  ■  is  inhibitory.  Each  excitatory  unit  is  paired  with  one  of  the  inhibitory 
units,  and  the  same  origin  point  configuration  is  assigned  to  both  members 
of  the  pair.  Thus  the  responses  of  the  inhibitory  units  exactly  duplica^te 
the  responses  of  the  excitatory  units.  The  reinforcement  rule  is  that  a 
positive  '  from  the  r  .  c  .  s  .  affects  only  the  excitatory  units ,  while  a 
negative  ■■  affects  only  the  inhibitory  units  .  With  this  rule,  the  signal 
j  I  which  goes  to  the  R-unit  in  response  to  •  is  the  sum  of  an 
excitatory  component  and  an  inhibitory  component,  the  total  being  exactly 
equal  to  what  it  would  be  in  the  equivalent  ^  -perceptron. 
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The  exact  pairing  of  the  excitatory  and  inhibitory  units  is,  of 
course,  an  inessential  artifact,  introduced  only  to  guarantee  that  the  two 
types  of  systems  are  truly  identical  in  performance.  If  the  origin  confi¬ 
gurations  of  all  units  are  selected  independently  of  one  another,  the 
expected  values  of  the  signals  will  be  unaffected,  but  the  variability  will  be 
somewhat  increased,  due  to  the  greater  number  of  independent  A-units 
contributing  to  the  signal.  Such  a  system  has  been  previously  described  as 
a  "differentiated  A-system"  (Ref.  79)- 


-194- 


8  .  PERFORMANCE  OF  ELEMENTARY  T  -PERCEPTRONS  IN 


PSYCHOLOGICAL  EXPERIMENTS 


It  will  be  recalled  that  the  reinforcement  rule  for  a  gamma 
system  (defined  in  Chapter  4,  Def.  38)  is  one  which  guarantees  that  the 
sum  total  of  the  value  of  all  connections  to  any  unit  remains  constant,  even 
though  the  values  of  individual  connections  may  change  with  time.  In  the 
notation  of  the  last  chapter,  the  change  in  the  value  of  the  connection 
due  to  the  reinforcement  of  stimulus  5y  was  given  by 

‘'-'■'’If.  .•  •  ;  j)  for  an  fy  -system.  (8.1) 

For  a  gamma  system,  the  corresponding  expression  is 


ViJj 


N, 


(8.2) 


A  variation  of  the  gamma  system,  which  will  be  designated  the  -system, 
is  of  interest  chiefly  because  it  is  considerably  easier  to  analyze.  For  this 
model. 


'  ; 


(8.3) 


This  is  equal  to  the  expected  value  of  for  the  ,7' -system,  and 

f 

with  large  values  of  N ^  the  /'-system  and  f  -system  become  indis¬ 
tinguishable  . 


The  organization  of  this  chapter  will  follow  closely  that 
of  Chapter  7.  The  first  section  deals  with  the  analysis  of  discrimination 
experiments  with  S -controlled  reinforcement,  and  presents  results  of  a 
number  of  experiments,  including  comparisons  with  the  ry -systems 
considered  in  the  last  chapter.  Discrimination  experiments  with  error 
correction,  and  discrimination  experiments  with  R-controlled  reinforce¬ 
ment  are  then  presented,  and  the  final  sections  deal  with  detection 
experiments,  and  other  performances  of  7’ -perceptrons  . 

8.1.  Discrimination  Experiments  with  S-controlled  Reinforcement 


8.1.1  Fixed  Sequence  Experiments:  Analysis 


As  in  the  case  of  the  alpha-system  analysis,  our  object  is  to 
compute  the  ratio  //  'n  ,■)  ,  for  the  class  of  perceptrons,  test 

stimulus,  and  training  sequence  under  consideration.  The  notation  and 
definitions  correspond  to  those  employed  in  Chapter  7.  The  analysis  again 
follows  that  of  Joseph  (Ref.  41).  For  the  '^-system,  the  expected  value 
of  is  obtained  as  follows:  The  value  of  the  connection  from  the  A-unit 

■  at  the  end  of  the  training  sequence  is  given  by: 


■y 


/  r 


.0  • 


p; 


I  o 


;)  - 


7' 


ij) 


•V;) 

I  '  •’ 


/ 


k  t  (■ 


-196- 


Consequently,  if  the  test  stimulus  5^  is  now  shown,  the  input  to  the 
response  unit  will  be 


ZI/ 


p.: 


i-’Z 

fi  ^  L 


(J) 


yielding,  for  the  expected  value  of  the  signal  u  ^ 


-  Ti 


(8.4) 


For  a  -system,  the  analysis  is  considerably  simplified.  In  this  case, 
the  value  of  the  connection  from  unit  ;  at  the  end  of  the  training  sequence 
is 


Collecting  the  signals  from  all  active  connections  when  occurs  yields 

the  input  to  the  R-unit, 

"’I 

'  j 

and  the  expected  value  of  this  signal  is 


(8.5) 
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The  variance  of  is  again  computed  from  the  general 
equation  (7.4),  given  in  the  last  chapter.  For  a  /'’-system,  the  same 
considerations  apply  as  in  the  rv.  -system,  namely,  that  the  only  source 
of  variability  in  the  signals  c-f.(x)  is  due  to  the  origin  point  configurations 
of  the  A-units,  which  are  selected  independently  for  the  different  A-units. 
Consequently,  the  equation  (7.5)  holds  identically  for  a  -system.  In  a 
true  'f  -system,  however,  the  signals  are  not  independent.  The 

If- 

value  ir-^,  upon  which  Ci^  depends  is  the  result  of  a  series  of  increments, 

,  each  of  which  depends  upon  the  particular  set  of  A-units  which  are 
active  at  the  time  of  reinforcement  (as  shown  in  Equation  8.2).  Consequently, 
for  a  gamma  system,  the  variance  is 


V.,)  - 


N  rr 


fx) 


+  I  N ^  -  /J  CO 


C;r  (x)  , 


I  r 


(8.6) 


The  reader  who  is  interested  in  the  detailed  analysis  of  this  expression 
will  find  a  full  algebraic  expansion  of  its  components  in  Ref.  41  .  The 
final  equation  which  results  is  as  follows: 


G  /  , 


f.  \ 


i  -  :  ^  £ 


(8.7) 
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An  analogous  treatment  for  the  ^-system,  based  on  Equation  (7.5)  ,  yields 
the  expression: 


/  N 


.  21 L  A  ,  '■<-  A/  /U  1  'aV  /  -  CV  A/ Or )  -  00 1  (Q-y-  Qj  Oy) 


(8.8) 


For  both  the  *'  -perceptron  and  the  y’ -perceptron ,  the  expectation  of 
and  the  variance  of  .  ^  are  both  on  the  order  of  /V^,  .  Consequently, 

the  ratio  ..  '  can  be  made  arbitrarily  large  by  increasing  /V^ 

This  means  that  the  theorem  stated  in  the  last  chapter  (Page  159)  holds  for 
'  and  '  -perceptrons  as  well  as  for  ^  -systems.  Equation  (7.7) 
can  again  be  used  for  a  close  approximation  to  the  actual  probability  of 
correct  response  for  a  or  /'  -perceptron,  substituting  the  appropriate 
expressions  for  the  mean  and  variance  in  each  case. 


It  is  interesting  to  note  that  if  the  expected  values  of  the 
generalization  coefficients,  ;  ,  are  substituted  into  equations  (7.3), 

(8.4),  and  (8.5),  identical  expressions  are  obtained  for  the  expectation 
of  /.A  ^  for  the  0/  ,  ,  and  'f  -systems .  The  expected  value  of 

the  un -normalized  coefficient,  T  -  ,  for  a  /  -perceptrc  i  is 

vVy  ;  for  a  -perceptron  it  is  ^  ~  .'2  .while 

for  an  •.  -perceptron  it  is  ,_  •  Substituting  these  quantities,  we 

obtain,  for  all  three  systems. 


Z 


(8.9) 


(8.10) 
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The  special  properties  of  the  /  and  ?'  -perceptrons  are  due  to  the 
fact  that  their  generalization  coefficients  for  a  binomial  model  tend  to  be 
negative  for  sufficiently  well  separated,  or  disjoint,  stimuli,  whereas  in 
the  case  of  an  o/  -system,  the  generalization  coefficients  are  all  non¬ 
negative.  In  a  Poisson  model,  while  it  is  possible  for  negative  generali¬ 
zation  coefficients  to  occur  due  to  random  variability  of  individual  per¬ 
ceptrons,  the  expected  values  of  tj ^ ;  are  always  non -negative ,  since 
;  Q;  vy  only  if  the  stimuli  are  disjoint.  These  features  are  of 
interest  for  R-controlled  experiments,  as  will  be  seen  presently. 

8.1.2  Fixed  Sequence  Experiments:  Examples 

Numerical  analyses  have  been  carried  out  mainly  for  the 
j'  -perceptrons,  since  the  equations  are  considerably  simpler.  For 
large  values  of  ,  the  and  T  -systems  will  have  identical  perform¬ 
ances.  Tables  3  and  4  (in  Chapter  7)  apply  identically  to  the  /f  -system, 
for  Experiments  1  and  2.  The  performance  curves  shown  in  Figures  13  and 
14  are  also  applicable.  Figure  21  shows  a  comparison  of  the  T  and  - 

systems  on  Experiment  1  (horizontal  vs.  vertical  bar  discrimination),  for 
the  optimum  parameters  with  a  binomial  model  (  z  ^  J',  y  J  ,  0  -  2  ). 

Figure  22  shows  a  similar  comparison  for  the  same  parameters,  with  Experi¬ 
ment  2 . 


It  is  clear  that  under  the  conditions  of  Experiments  1  and  2,  the 
'f  -systems  have  no  advantage  over  the  -perceptrons.  The  equivalence 

of  the  curves  is  due  to  the  fact  that  in  these  experiments,  all  stimuli  are 
equal  in  area  (yielding  equal  ?;  for  all  stimuli),  the  number  of  stimuli  in 
each  class  is  equal,  and  all  stimuli  occur  with  equal  frequency.  .  If  the  sizes 
or  frequencies  are  unequal,  the  /'  -system  may  have  a  marked  advantage, 
as  will  be  seen  in  the  analysis  of  Experiment  4,  in  Section  8.1.4. 
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Figure  22  PROBABILITY  OF  THE  CORRECT  IDENTIFICATION  OF  ANY  TEST  BAR  vs.  THE 
NUMBER  OF  ASSOCIATION  UNITS,  IN  EXPT.  2 


X  =  2  y  =  I  9=2  ALTERNATING  DICHOTOMV 
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8.1,3  Random  Sequence  Experiments:  Analysis 

The  un-normalized  generalization  coefficients  for  a  'f  and 
T  -system  are  given  by 

(7  •  •  n  -  ■  -  -  n-  n  ■  fo  r  a  ./’-system  (8.11) 

Q.  .  =  n-  ■  -  Q  ■  n  ■  for  a  /  -system  (8.12) 

i-j  J  <■  u  y 

where  -  the  number  of  A-units  responding  both  to  5;  and  to  S' 

As  in  the  (V.  -system  analysis  (Section  7.1,4)  the  training 
sequence  is  assumed  to  consist  of  1  stimuli,  where  each  stimulus,  Sj 
has  a  probability  of  being  selected  at  any  step  of  the  training  sequence. 

The  analysis  has  been  carried  out  only  for  the  ' -perceptron,  since  the 
true  j  -system  leads  to  excessively  cumbersome  expressions  for  the 
variance.  For  large  ,  as  observed  in  the  preceding  section,  the  two 
systems  should  be  virtually  indistinguishable  in  performance. 

For  the  T '-system,  the  input  to  the  response  unit  when  5y 

A- 

occurs  after  the  training  sequence  is 

•L ..  -  /  p-  m  ■{  n -  Cj  •  n  .  I 

'■  ^  '  J  J  J 

J 

where  mj  ,  as  before,  is  the  number  of  times  that  occurs  in  the 

training  sequence.  Taking  the  expected  value  of  this  expression,  we 
obtain 

Hiu,)  T  fj-  { Q  - ^  -  (,J  Q^.  ) 

J 

(8.13) 
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The  variance  of  6/^  over  both  perceptrons  and  training  sequences  is 
again  given  by  equation  (7.  10).  In  the  present  case,  this  yields; 


--  'TN.Z.  pj\_0jx-20jQj,i-0jQ^^(N^-l)(Q^,-QjQ,)\ 


^  ^  ^  (X  LL  pj  P-A  -  Qa  Qjx  ^  Qj  Oa  Qx  ) 

A 


(8.14) 


The  detailed  derivation  of  this  expression  can  be  found  in  Ref.  41.  It  can 
readily  be  seen  that  the  theorem  of  Section  7.  1..2  continues  to  hold  for  this 
system.  Actual  performances  can  again  be  calculated  by  using  Equation  (7.7). 


8.1.4  Random  Sequence  Experiments:  Examples 

A  comparison  of  binomial  '/  and  '-perceptrons  on  the 

random  sequence  version  of  the  horizontal/vertical  bar  experiment 
(Experiment  3)  is  shown  in  Figure  23.  A  curve  obtained  from  the  simulation 
of  a  true  ^  -system  with  the  same  parameters  is  included  for  comparison. 
The  simulation  curve  shows  the  average  of  100  runs.  Figure  24  compares 
the  performance  of  the  binomial  model  with  that  of  a  Poisson  model,  on  the 
same  experiment. 

In  Figure  25,  the  performance  of  a  ^  -system  in  the 
"frequency  bias"  experiment  (Experiment  4)  is  shown,  with  the  mean 
performance  curve  of  the  equivalent  -system,  from  Figure  14, 
included  for  comparison.  A  comparison  with  Figure  16  shows  that  under 
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conditions  of  unequal  frequency  for  the  two  classes  to  be  discriminated, 
the  f  -system  may  have  a  marked  advantage.  The  effect  of  frequency 
bias  on  a  /  -system  is  also  shown  in  a  number  of  simulation  experiments 
with  the  IBM  704  computer,  which  have  been  described  previously  (Ref.  84), 
The  horizontal/vertical  bar  discrimination  problem  happens  to  show  up  the 
J  -system,  to  its  best  advantage,  since,  with  a  binomial  perceptron,  the 
expected  value  of  the  generalization  coefficient,  ,  where  Si  and  Sj 

are  in  opposite  classes,  is  zero  for  this  particular  problem.  A  Poisson 
model,  where  the  interaction  between  the  horizontal  and  vertical  bar  classes 
is  non-zero,  would  not  perform  as  well  in  this  experiment,  and  the  binomial 
model  would  also  perform  less  well  in  experiments  with  classes  of  stimuli 
which  could  achieve  greater  intersections. 

Figures  26,  27  and  28  show  some  typical  experiments  performed 
with  a  digital  simulation  program,  for  binomial  -perceptrons  of  sizes  up 
to  '  ,  and  a  72  by  72  retina.  The  stimuli  are  kept  within  the 

retinal  field  in  these  experiments  by  requiring  that  their  centers  remain 
within  a  13  by  13  field,  so  that  there  are  169  possible  positions  for  each 
stimulus.  In  Figure  26(b),  the  effect  of  allowing  rotations  up  to  30  degrees 
and  up  to  359  degrees  (inclusive),  in  addition  to  displacements  within  the 
retinal  field,  is  illustrated.  Figure  28  shows  the  effect  of  size  bias  where 
one  class  of  stimuli  (the  letter  "F")  can  be  considered  as  subsets  (on  the 
retina)  of  stimuli  of  the  other  class  (the  letter  "E").  With  purely  excitatory 
connections  from  the  retina,  the  situation  is  clearly  much  worse  than  with 
both  excitatory  and  inhibitory  connections,  as  shown  in  Figures  28(a)  and  (b). 

From  the  equations  for  the  expected  value  of  the  signal 
(Equation  8.  13,  for  example)  it  can  be  seen  that  a  bias  in  the  correct 
direction  may  exist  even  when  the  perceptron  is  occasionally  reinforced 
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CEO 


OF  STIMU 


TIME  (NO.  OF  STIMULI) 


Figure  27  SQUARE-DIAMOND  DISCRIMINATION.  A/^  =  1000,  x  =  10,  y  =  0,  0=4 

CENTERS  PLACED  IN  13  x  13  FIELD 
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in  the  wrong  direction.  Several  experiments  have  been  carried  out  by 
Hay  using  the  Mark  I  perceptron  at  CAL,  to  study  the  effect  of  "random 
errors"  by  the  experimenter  training  the  machine  (Ref.  30).  In  an 
experiment  on  the  discrimination  of  the  letters  "E"  and  "X"  with  a  - 
perceptron  employing  S -controlled  learning,  it  was  found  that  the  perceptron 
learned  to  discriminate  the  letters  with  100%  accuracy  despite  the  introduction 
of  30%  misidentifications  by  the  experimenter  (i.  e  .  ,  by  the  r  .  c  .  s  .  ).  This 
experiment  emphasizes  the  fact  that  the  perceptron  can  exceed  the  level 
of  performance  of  its  "teacher"  or  reinforcement  control  system. 

8.2  Discrimination  Experiments  with  Error -Corrective  Reinforcement 


While  it  has  been  demonstrated  in  Chapter  5  (Theorem  8)  that 
the  error  correction  procedure  will  not  always  lead  to  a  solution  with  the 
f  -system,  practical  systems  seem  to  work  about  as  well  as  -systems, 
and  may  actually  learn  somewhat  faster  in  some  cases.  Figures  29  and  30 
illustrate  two  sets  of  experiments  on  /  -perceptrons  ,  using  the 
Burroughs  220  computer  at  Cornell  University,  in  which  performance  is 
compared  with  perceptrons  having  the  same  topological  organizations,  but 
employing  an  (V.  -system  memory  rule.  Since  the  error  correction 
procedure  will  lead  to  a  solution  regardless  of  sequence  or  relative 
frequency  of  stimuli  in  the  classes  being  discriminated,  and  regardless 
of  relative  sizes  of  stimuli,  the  special  advantages  of  the  9' -system  in 
overcoming  frequency  bias  and  size  bias  are  relatively  unimportant  here. 

In  most  experiments  with  error-corrective  reinforcement,  therefore,  the 
simpler  <'j/  -rule  is  generally  employed. 


-211  - 


o 

o 

IT) 


2 

O 


z 

§  H 


CO 

•<  UJ 
=3  CO 
O  < 

CO  CJC  O  CO 

o 

sc  C3:  to  II 

~  r*. 

LU  u.  Q) 

CO  O 
Z  fE 

LU  H  Z  :i- 


I—  ”  ^ 

CO  3e  LU  II 


Q  <t 

Z  Z  II 


o 

z 

<  "  • 

V.  2:  H  X 

o 

z 

^  ii:  X  - 

o 

— 

u.  O  LU  O 

CM 

< 

/>»• 

O  CO'— o 

H 

—  CO 

Z  CD  z 

u. 

O  O  II 

o 

o 

CO  lU  —  ^ 

o 

—  -1 1— 

“ 

o 

z  cs  o  < 

z 

<;  z  LU 

CL  •<  QC  Z 
z  —  Z  ^- 
o  z  o  — 

o 

O  H  O  » 

CX. 


0 

L. 

:d 

CJ) 


CM 


0. 


0 

L. 

3 

cn 


-212- 


8.3  Discrimination  Experiments  with  R-controlled  Reinforcement 


The  performance  of  a  /  -perceptron  in  R-controlled 
experiments  (where  the  r.c.s.  is  entirely  isolated  from  the  environment 
and  reinforces  the  perceptron  positively  at  all  times,  regardless  of  what 
its  current  response  happens  to  be)  is  somewhat  more  interesting  than  that 
of  the  'x''  -perceptron.  Since  it  is  possible  to  have  negative  generalization 
coefficients  for  the  i"  -model,  two  distinct  possibilities  suggest  themselves 
which  were  not  present  before:  (1)  The  system  may  form  an  unstable 
classification  of  the  environment,  with  individual  stimuli  continually  shifting 
membership  from  one  class  to  the  other,  due  to  negative  interaction  between 
successive  reinforcements;  (2)  the  system  may  form  a  stable  dichotomy  with 
some  stimuli  in  the  positive  class  and  some  in  the  negative  class.  The  third 
possibility  corresponds  to  the  expected  situation  with  an  06  -system, namely: 
(3)  The  systeni  may  form  a  stable  classification  with  every  stimulus  in  the 
same  class,  the  alternative  class  being  empty. 

>!< 

An  unpublished  theorem  by  H.  Kesten  proves  that  (for  a  ^  - 
system  in  which  the  values  are  allowed  to  grow  without  bound)  the  first 
alternative  is  impossible.  Every  perceptron  will  ultimately  form  a  "stable” 
classification,  in  which  every  stimulus  is  assigned  to  one  of  the  two  classes 
and  will  remain  in, the  same  class  with  probability  1  at  any  future  time.  The 
remaining  two  alternatives  both  remain  possible,  however. 

At  the  present  time,  a  fully  satisfactory  analysis  of  the  classi¬ 
fication  tendencies  of  -perceptrons  which  are  "left  on  their  own"  in  an 
R-controlled  experiment  is  not  available.  A  number  of  special  cases  can 

Personal  communication. 
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be  analyzed  heuristically ,  however,  and  some  of  these  are  illuminating. 
Moreover,  a  series  of  simulation  experiments  has  been  completed  which 
illustrates  performance  on  some  typical  problems. 

The  basic  feature  of  this  system  in  an  R-controlled  experiment 
is  a  tendency  to  classify  stimuli  on  the  basis  of  retinal  location,  rather  than 
geometrical  similarity.  If  two  stimuli  occur  in  the  sam.e  location  on  the 
retina,  covering  largely  the  same  set  of  sensory  points,  will  tend  to 

be  positive,  so  that  the  reinforcement  of  one  stimulus  will  tend  to  generalize 
automatically  to  the  other.  A  "cluster”  of  such  stimuli,  projected  onto  a 
limited  region  of  the  field,  will  tend  to  be  classified  the  same  way,  either  all 
positive  or  all  negative.  On  the  other  hand,  two  stimuli  which  cover  disjoint 

ii. 

sensory  sets  will  (in  a  binomial  model)  tend  to  have  a  negative  q-j  .  In 
this  case,  reinforcing  with  /)/  positive  will  automatically  assign  , 

to  the  negative  class,  if  its  value  was  previously  zero.  Thus,  clusters  of 
stimuli  which  arc  "well  separated"  will  tend  to  go  into  opposite  classes,  with 
a  binomial  //  -perceptron.  The  following  experiment  illustrates  this 
tendency  quite  clearly: 

EXPERIMENT  9;  For  the  same  retina  and  environment  of  horizontal  and 
vertical  bars  described  in  Experiment  1,  let  the  stimuli  occur  in  a  random 
sequence,  as  in  Experiment  3.  During  the  training  sequence,  R-controlled 
reinforcement  is  employed.  The  response  to  each  of  the  4C  bars  is  then 
determined,  to  establish  the  classification  which  has  been  developed  by  the 
per  ceptron . 


In  a  Poisson  model,  the  expectation  of  ■  for  disjoint  stimuli  is  zero, 

'J 

in  the  ^-system,  and  all  stimuli  will  tend  to  go  into  the  same  class 
unless  they  form  completely  disjoint  clusters,  in  which  case  the  class 
assignment  will  be  random  for  each  cluster. 
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In  a  number  of  repetitions  of  this  experiment  (which  was 
simulated  with  a  704  computer  for  a  very  large,  or  infinite  ,  binomial 

perceptron,  it  was  found  in  every  case  that  the  perceptron  placed  ten 
adjacently  located  horizontal  bars  and  ten  adjacent  vertical  bars  in  the 
positive  class,  and  the  other  ten  bars  of  e  a  c  h  type  in  the  negative  class. 

The  dynamics  of  the  process  can  be  readily  followed  in  a  heuristic  fashion. 

The  first  bar  to  be  seen  --  say  a  vertical  bar  --  may  evoke  a  positive  or 
negative  response  at  random.  If  t'  -  -h  !  ,  then  the  connections  from  the 

responding  A-units  will  each  gain  a  positive  increment  of  value,  and  connections 
from  inactive  A-units  will  become  slightly  negative,  so  that  the  total  value  is 
conserved.  For  two  disjoint  bars  in  the  "same"  class  (i.e.  ,  both  horizontal 
or  both  vertical)  •  w'ill  be  negative,  but  for  the  two  closest  neighbors  on 
either  side,  w'ill  be  positive.  The  generalization,  ■/•j  ,  to  members 

of  the  "opposite"  class  (i.e.  ,  one  horizontal  and  one  vertical)  will  be  zero, 
since  the  intersection  between  any  horizontal  and  vertical  bar,  in  this 
environment,  is  equal  to  its  expected  value,  yielding  zero  generalization  for 
a  binomial  -system  (see  Page  146).  Consequently,  the  horizontal  and 
vertical  bars  will  never  interact,  regardless  of  the  sequence  in  which  they 
occur,  and  each  of  these  two  sets  of  stimuli  will  organize  independently. 
Consider,  therefore,  the  development  of  a  classification  for  the  vertical  bars, 
after  the  first  has  been  associated  to  -•  *  /  .  If  the  second  vertical  bar 

in  the  training  sequence  should  happen  to  be  one  of  the  two  close  neighbors 
on  either  side  of  the  original  bar,  this  will  immediately  evoke  the  response 
r  ^  /  ,  and  will  be  reinforced  in  the  same  direction  as  the  previous  bar, 
extending  the  net  positive  generalization  to  at  least  one  additional  member  of 
the  vertical  set.  At  the  same  time,  vertical  bars  which  are  more  than  two 
positions  removed  from  both  of  the  bars  already  seen  will  now  have  twice 
the  negative  reinforcement  that  they  received  before,  due  to  the  summation 
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of  the  negative  j  .  If  one  of  these  bars  should  occur,  the  response  will 
be  -1  and  will  be  negative.  This  will  not  only  spread  negative  value  to 

the  adjacent  stimuli,  but  will  add  to  the  positive  value  of  the  stimuli  which 
were  previously  placed  in  the  positive  class.  Thus  two  mutually  supporting 
"nuclei"  of  stimuli  are  formed,  one  in  the  positive  class  and  one  in  the 
negative  class,  which  tend  to  spread  their  domain  to  neighboring  stimuli, 
but  tend  to  "repel"  remote  stimuli,  supporting  their  adhesion  to  the  opposite 
class.  Under  these  conditions,  it  is  plausible  that  the  most  stable  balance 
between  classes  will  be  found  when  the  classes  are  evenly  divided,  each 
tending  to  attract  marginal  stimuli  from  the  other  to  the  same  degree. 

Simulation  experiments  with  this  procedure  show  that  a  stable 
dichotomy  tends  to  be  formed  after  the  first  few  hundred  stimuli  of  the 
training  sequence,  the  probability  of  a  change  in  class  membership  being 
very  small  thereafter.  The  terminal  condition  is  of  the  type  indicated  above, 
with  10  horizontal  and  10  vertical  bars  in  each  class  of  the  dichotomy. 

8.4  Detection  Experiments 


In  detection  experiments,  the  same  general  conclusions  hold 
true  as  in  the  case  of  r.  -systems  (Section  7.4).  In  the  case  of  noisy 
environments  with  a  large  retina,  it  was  noted  that  the  intersection  of  a 
noise  pattern  with  any  other  stimulus  will  be  equal  to  the  expected  value 
of  the  intersection,  i.e.,  to  the  product  of  the  measures  of  the  active 
S-sets.  For  the  binomial  -system,  this  implies  zero  generalization 
from  a  reinforced  "positive"  stimulus  to  a  noise  pattern,  and  zero 
generalization  from  one  noise  pattern  to  another.  This  means  that  a 
class  of  positive  stimuli  can  be  learned  without  any  generalization  to  noise 
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patterns,  but  that  negative  training  on  a  limited  sample  of  noise  patterns 
does  not  generalize  effectively  to  new  noise  patterns.  As  in  the  case  of 
the  V  -system,  the  use  of  a  threshold  greater  than  zero  on  the  R-units 
should  effectively  separate  positive  stimuli  from  noise  patterns.  It  is 
worth  noting  that  for  discriminating  a  single  class  of  positive  stimuli 
from  noise,  a  monopolar  reinforcement  system  (Defintion  35,  Chapter4) 
will  work  as  effectively  as  a  bipolar  system,  since  reinforcement  given  for 
negative  responses  has  little  or  no  effect  on  future  performance  (except  for 
those  noise  patterns  actually  seen,  or  nearly  identical  to  those  seen). 

Several  experiments  have  been  performed  with  the  Mark  I 
perceptron  at  CAL  to  evaluate  the  performance  of  'if  -perceptrons  in  noisy 
environments,  and  in  problems  in  which  positive  stimuli  such  as  letters  of 
the  alphabet  have  been  mixed  with  extraneous,  but  similarly  organized 
stimuli  (geometric  patterns,  other  letters,  etc.).  Performance  on  the 
discrimination  of  the  letters  "E"  and  "X,"  with  various  amounts  of  noise 
present  has  been  reported  by  Hay  in  Ref.  30.  Two  240  A-unit  perceptrons 
were  tested,  both  learning  to  perfection  in  the  absence  of  noise.  With  noise 
present,  one  perceptron  learned  as  well  as  before,  the  second  falling  to 
about  75%  accuracy.  The  amount  of  noise  introduced  was  not  carefully 
quantified  in  these  experiments,  but  it  is  clear  that  tlie  perceptron  can 
perform  appreciably  better  than  chance  as  long  as  a  human  observer  can 
still  delect  the  original  letters  embedded  in  the  image  In  the  experiments 
with  superimposed  images  of  irrelevant  patterns,  a  poorer  level  of 
performance  is  obtained.  A  perceptron  trained  to  respond  positively  to 
the  letter  X,  with  monopolar  .4’ -  reinfo  rcement ,  will  generally  give  the 
proper  response  whenever  an  "X"  is  present,  but  tends  to  give  the 
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positive  response  quite  frequently  to  triangles,  squares,  or  other  letters  as 
well.  The  introduction  of  a  high  response  threshold  improves  performance 
considerably,  but  a  system  capable  of  responding  in  terms  of  figure -ground 
organization  would  clearly  have  a  great  advantage  in  such  experiments.  As 
the  quantity  of  background  material  is  increased,  the  performance  of  an 
elementary  perceptron  in  detection  experiments  deteriorates  rapidly. 

A  striking  difference  between  an  elementary  perceptron  and  a 
human  observer  in  detection  experiments  is  that  the  human  will  show  vast 
differences  in  performance  depending  upon  organizational  properties  of  the 
background  and  its  relationship  to  the  figure.  For  example,  the  human 
observer  will  readily  recognize  the  letter  "E”  in  Figure  (a),  but  will  find 
it  hard  to  segregate  the  "E"  from  the  extraneous  lines  in  Figure  (b).  An 
elementary  perceptron  would  show  little  or  no  difference  between  these  two 
situations  . 


Typical  test  patterns  for  detection  experiments 
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8.5  Generalization  and  Other  Capabilities 


In  "pure"  generalization  experiments,  where  the  test  stimuli 
are  disjoint  from  the  training  stimuli,  the  '  -system  has  no  advantages 
over  the  -system.  In  fact,  the  binomial  -system,  due  to  its 
negative  >  ■  ■  for  disjoint  stimuli,  will  actually  tend  to  place  a  disjoint 
stimulus  in  the  opposite  class  from  the  reinforced  stimulus,  unless  members 
of  the  opposite  class  have  also  been  reinforced,  in  which  case  the  effects  tend 
to  cancel . 


Where  the  training  stimuli  cover  the  retina  in  a  representative 
sample  of  locations,  the  gamma  system  has  the  possible  advantage  of  low 
or  negative  generalization  to  patterns  which  have  small  intersections  with 
the  trained  patterns.  This  shows  best  in  such  experiments  as  the  horizontal/ 
vertical  bar  discrimination  experiment,  where  generalization  from  horizontal 
to  vertical  bars  is  zero.  As  was  noted  in  the  case  of  R-controlled  discrimina¬ 
tion  experiments,  generalization  in  /  -systems,  as  with  all  elementary 
perceptrons,  tends  to  be  based  on  the  location  rather  than  the  similarity  of 
the  stimuli,  in  any  more  fundamental  sense.  Ideally,  we  would  hope  to  find 
a  system  in  which  g; j  is  large  for  all  pairs  of  stimuli,  5;  and  j-  ,  which 
are  "similar"  or  "equivalent"  under  some  group  of  spatial  transformations, 
such  as  rigid  motions,  dilatations,  or  projective  transformations,  and  small 
or  negative  otherwise.  Except  in  exceptional  and  highly  restrictive 
environmental  conditions,  this  condition  is  not  to  be  found  in  elementary 
perceptrons  .  Highly  artifactual  organizations  which  have  the  required 
property  can  be  designed  in  the  case  of  four-layer  series  coupled  perceptrons. 
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as  will  be  seen  in  Chapter  15  .  Systems  which  spontaneously  acquire  the 
required  organizational  properties  are  found  chiefly  among  the  cross - 
coupled  perceptrons,  however,  and  will  be  discussed  in  Part  III  of  this 
volume . 


In  general,  it  is  seen  that  -perceptrons  have  much  the  same 
properties  as  ",  -systems.  In  S -controlled  experiments ,  especially  with 
frequency  and  size  bias  present,  they  perform  somewhat  better,  but  in 
error  correction  experiments  there  is  little  to  be  gained  from  the  gamma 
rule,  and  there  is  the  possibility  that  the  ■’  -system  may  fail  to  work  where 
an  "V  -system  would  have  succeeded,  as  proven  in  Chapter  5.  The 
performance  in  R-controlled  experiments  is  som.ewhat  more  interesting 
than  that  of  -systems,  but  the  classifications  which  are  formed  spon¬ 
taneously  tend  to  form  on  a  basis  of  classification  related  to  position  of 
stimuli  on  the  retina,  rather  than  similarity,  and  are  consequently  of 
minimum  psychological  interest. 

The  /!'  -system  may  be  somewhat  more  plausible  as  a  biological 
memory  mechanism,  due  to  its  fundamental  conservative  property.  If 
biological  memory  is  due  to  a  physical  process  which  maintains  some  over¬ 
all  equilibrium,  sucli  as  a  chemical  substance  the  total  amount  of  which 
remains  invariant,  or  a  competition  among  afferent  processes  for  "Lebensraum" 
in  the  neighborhood  of  an  efferent  neuron,  this  property  would  certainly  be 
indicated.  It  should  be  emphasized,  however,  that  the  conservation  of  the 
total  value,  as  in  the  systems  considered  in  this  chapter,  is  insufficient  to 
keep  individual  coupling  coefficients,  •  ,  from  becoming  indefinitely 

great,  since  they  may  be  balanced  by  negative  values  of  equal  magnitude. 

Such  a  condition  is  quite  implausible  in  any  real  physical  system.  In  the 
next  chapter,  elementary  perceptrons  with  memory  dynamics  which  limit 
the  growth  of  the  values  are  considered. 
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9.  :elementary  perceptrons  with  limited  values 


Two  basically  different  mechanisms  for  limiting  the  growth  of 
values,  (/,••  ,  will  be  considered  in  this  chapter.  The  first  mechanism 

is  a  simple  upper  and  lower  bound,  such  that  the  value  may  grow  up  to  the 
designated  limit  but  no  further.  Systems  employing  this  mechanism  show 
"saturation  properties"  as  the  connections  attain  their  limits.  The  second 
mechanism  is  an  exponential  decay,  which  determines  an  equilibrium  point 
for  each  "y  depending  upon  the  frequency  with  which  it  is  reinforced. 

If  the  decay  rate  is  very  small,  such  systems  tend  to  approach  a  terminal 
state  resembling  the  performance  characteristics  of  a  perceptron  with  un¬ 
limited  values  after  a  long  training  sequence.  Systems  with  strictly  bounded 
values  will  be  considered  first. 

9.  1  Analysis  of  Systems  with  Bounded  Values 


Two  types  of  analysis  have  been  carried  out  for  systems 
having  upper  and  lower  bounds  for  The  first  deals  with  the 

terminal  distribution  of  the  values  after  a  long  period  of  exposure  to  a 
random  sequence  of  stimuli,  with  S-controlled  reinforcement.  The  second 
deals  with  the  actual  performance  of  a  bounded-value  perceptron.  In  both 
cases,  we  will  follow  the  method  of  analysis  originally  employed  by 

,  ;[c 

Joseph,  in  connection  with  bounded  -^-perceptrons  (Ref.  41)  .  All  of 
these  analytic  results  apply  to  experimental  systems  using  S-controlled 
reinforcement  procedures. 

Bounded  -systems  have  been  called  A  -systems  in  Ref.  41. 
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9.1.1  Terminal  Value  Distribution  in  a  Bounded  -system 


Suppose  an  cv  -perceptron  has  upper  and  lower  limits  L  and  £ 
for  the  values  .  Suppose  a  particular  connection,  ,  receives 

a  reinforcement  of  +1  with  probability  p  ,  -1  with  probability  ^  ,  and  0 

with  probability  t  -  p  -  .  If  all  stimuli  are  equiprobable ,  and  the 

perceptron  is  trained  by  an  S-controlled  procedure,  this  would  correspond 
to  a  connection  from  an  A-unit  with  bias  ratio  p^  ^  (see  Definition,  Page  77). 
It  is  assumed  in  the  following  analysis  that  the  reinforcements  occurring  at 
different  times  are  statistically  independent.  For  convenience,  L  and  / 
are  taken  to  be  integers.  Then  the  value,  ’r- j  ,  may  assume  any  one  of 
f.-./T^/distinctstates(/,/f-/,...,  L  ).  Clearly,  if  unit  a; 
responds  more  often  to  stimuli  of  the  positive  class  than  to  stimuli  of  the 
negative  class,  will  tend  to  grow  in  a  positive  direction.  Eventually 

it  will  arrive  at  the  limit  L  ■  At  this  point,  a  run  of  "negative"  stimuli 
may  bring  it  down  again,  but  it  can  never  exceed  L  .  If  the  unit  has  a 
negative  bias,  " ;r  similarly  tend  to  remain  in  the  neighborhood  of 

the  lower  limit,  /  The  problem  is  to  find  the  terminal  probability 
distribution  (if  one  exists)  for  the  value  ,  as  the  duration  T  of  the 

training  sequence  goes  to  infinity. 

In  the  following  analysis,  it  will  fii  .u  be  assumed  that  a  stable 
terminal  probability  distribution  for  exists,  which  will  not  be 

altered  by  the  addition  of  more  stimuli  to  the  training  sequence.  On  the 
basis  of  this  assumption,  an  equation  for  the  distribution  can  be  found.  It 
will  then  be  proven  by  induction  that  the  proposed  distribution  is,  in  fact, 
a  stable  probability  distribution. 
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Let  'iT  fx)  probability  that  ir;,-  “  /  .  in  the  terminal 

probability  distribution.  Let  TT(£)  =  £  •  This  will  be  equal  to  the 

probability  of  arriving  at  /  from  above,  plus  the  probability  that 

1/  -f.  remains  in  state  /  if  it  is  already  there.  Thus, 


n!£)  =  r  r-.  n  Tr(£)  X  n(£x  D  t- d  -  p  -  (p)  n{i) 


Hence 


p  ri' '  i' , 


(9.1) 


For  any  integer  7  '' 


l  i  -  *1' 


fj  (  y-  A  ('  -  /  /■  •  p  Tf  ( £  -f-  I )  r  nTT  [£  -I-  \  -  2)  -h  { i  -  fj  -  cp  }  rr  (£  -h  L  -  £ 


Hence , 


!  I  ■  y  I 


(9.2) 


Tlius,  all  values  of  }T ( x )  can  be  computed  if  the  probability  xT  of  >£if- 

being  at  the  lower  limit  is  known.  Since  the  sum  of  //  for  all  possible 
values  of  must  be  1  ,  the  value  of  /:  can  be  obtained  from  the 

equation : 


(9.3) 
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For  the  distribution  to  be  stable,  it  is  sufficient  that  the  proba¬ 


bility  of  being  at  its  upper  limit  satisfies  the  equation. 


i  -  p'fn’i  -/)  +  (i-n)rT[L) 

By  induction  on  ,  it  will  be  shown  that 


(9.4) 


II  i  t  / 


i }  r  L  >  t  -  l)\  t-  { i  -  rj  )  JT  {P.  +  i) 


Y  r  ("  Z’  y ;  -  /  y 


(9.5) 


for  /  5  L  L  -  P  .  (9.4)  is  only  a  special  case  of  (9.5). 

To  begin  with,  for  i  -  f  .  we  have  TT  { P )  -  r  and  from  (9.  1), 
n !  /  -h  !)  =,  .  This  clearly  agrees  with  (9.5).  Now  assume  (9.5)  is 

'i' 

true  for  <  f  <  r  ^  l  -  ^  .  That  is 


V  /  '  '  V'  I  P  ^  r  /  ' 

/ 

But  by  (9.2),  letting  ,  we  then  obtain 


n(i'  f  r  t  I J 


\  ^  nU.^-D 

7  t.  y 

Jj 

1  .  c  -  r  / 

7 


V 

^  TT[.,  -  r  -  /; 

0 

r 


Thus,  having  assumed  (9.5)  to  be  true  for  /  r  ,  we  find  that  it  is 
also  true  for  /  /•  ^  /  ;  consequently  it  is  true  for  all  /  ,  and  (9.5) 

must  be  true.  From  (9.5)  it  is  also  clear  that  the  quantities  TT  will 
all  be  non -negative ,  so  that  the  function  Hi-/.)  meets  the  requirements  for 
a  probability  distribution. 
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Equation  (9.5)  can  be  used  to  compute  TTiy)  by  assuming  an 
arbitrary  value  for  .c  ,and  then  normalizing  the  distribution  as  in  (9.3). 
The  equation  can  be  simplified  by  taking  the  lower  limit,  ,  equal  to 


zero,  and  setting  c  ~  !  for  the  unnormalized  distribution.  Then 
lT(x)  -  prior  to  normalization.  For  the  normalized  distri- 

_  I  f  _ .  ...  . 


bution , 


!  I  ' 


/.  f  c. )  -  ■ 


This  completes  the  proof  of  the  follov/ing  theorem: 


THEOREM; 


In  a  bounded  v  -perceptron,  with  S-controlled  reinforce¬ 
ment,  the  probability  distirubtion  (for  the  value  of 


a  particular  connection)  approaches  a  stable  terminal 


distribution  of  the  form  -  c[ 


y  .  V  - 


where  c 


is  a  normalization  constant  equal  to  — /y! \)'^T7rT 


Figure  31  shows  the  probability  distribution  for  for 

several  values  of  t  and  for  40  increments  between  the  upper  and  lower 
limits.  (The  distributions  are  symmetric  for  equivalent  values  of  ^  , 

with  upper  and  lower  limits  reversed.)  Note  that  with  even  a  slight  bias 
•  <  )  there  is  a  very  low  probability  that  ■  ;  .  will  have  a  sign 

opposite  to  the  bias.  For  .  ,  for  example  (and  taking  •-  -20 

!  A  22  ,  as  in  the  figure)  the  probability  of  a  positive  hr  the 

terminal  distribution  is  only  ,0097.  If  the  range  were  half  as  great  (ZO 
increments  instead  of  40)  the  probability  of  positive  for  the  same 

conditions  would  be  increased  to  .ZZ95. 


-ZZ5- 


p 

The  frequency  of  possible  ratios  ——  for  A -units  responding 

r 

to  horizontal  and  vertical  bars  can  be  determined  from  Table  1.  From  this, 
it  is  clear  that  the  majority  of  units  have  a  pronounced  bias  towards  one 
class  or  the  other,  so  that  one  might  expect  fo  find  the  majority  of  active 
connections  having  values  in  the  neighborhood  of  the  appropriate  limit,  L 
or  i  .  This  heuristic  argument  supports  the  conjecture  that  the  bounded 
system  should  still  be  capable  of  learning  discrimination  tasks  in  S-controlled 
experiments,  even  though  the  system  tends  to  "saturate",  with  all  values  in 
the  neighborhood  of  the  upper  or  lower  limit.  The  quantitative  performance 
of  such  systems  will  be  taken  up  in  Section  9 .  1.3. 


9.1.2  Terminal  V.alue  Distribution  in  Bounded  / -systems 


In  a  bounded  -perceptron ,  the  analysis  of  the  terminal 
distribution  for  '/•„  is  complicated  by  two  considerations.  First,  there 
are  at  least  four  possible  values  of  A'lr  .  namely  !  -  0;  ,  -  /  v-  , 

-  ,  and  ^  •  ,  each  with  its  own  probability.  If  O-  is  not  equal  for 

all  stimuli,  the  number  of  possible  values  for  '  is  increased  in 
proportion  to  the  number  of  different  values  for  C"  .  The  second 
consideration  is  that  the  conservation  rule,  which  requires  the  sum  of  all 
values  to  remain  constant,  makes  the  admissible  increment  for  one 
connection  dependent  on  how  many  of  the  other  connections  are  currently 
free  to  move.  For  example,  if  all  of  the  "active"  connections  have  values 
equal  to  /  ,  the  expected  decrement.  -  ,  •  ,  for  the  inactive  connections 

due  to  the  application  of  a  positive  /.-i-  cannot  occur. 
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Due  to  these  complications,  an  analysis  for  a  true  ,?’-system 
has  never  been  carried  out.  An  analysis  has  been  completed  by  Joseph 
for  a  /’  -system  with  monopolar  reinforcement  (i.e.  ,  reinforcement 
is  applied  only  for  stimuli  of  the  positive  class,  and  -  0  for  stimuli 
of  the  negative  class).  In  this  case  there  are  only  two  non-zero  changes 
which  might  occur,  '-s’-  for  active  connections  and  for  inactive 

connections,  and  the  reinforcement  of  a  given  connection  does  not  depend 
on  the  state  of  any  other  parallel  connections-,  as  it  does  in  the  ->  -system. 
The  analysis  is  a  somewhat  more  complicated  form  of  that  presented  in  the 
preceding  section  (due  to  the  inequality  of  positive  and  negative  changes  in 
)•  Since  the  equations  are  of  limited  interest  aside  from  the  specific 
model  considered,  they  will  not  be  repeated  here,  but  they  can  be  found, 
together  with  typical  distribution  curves,  in  Ref.  41. 

9.1.3  Performance  of  Bounded  oi -systems  in  S -controlled  Experiments 


P’rom  tlie  preceding  analysis,  ii  is  clear  that  with  a  large 
number  of  increments  between  the  upper  and  lower  limits  of  •  ,  the 

value  will  ultimately  tend  to  remain  in  the  neighborhood  of  the  upper  or 
lower  bound,  depending  upon  the  bias  ratio  of  a-  .  In  the  following 
analysis,  the  problem  is  simplified  by  assuming  tliat  the  limits  are 
actually  trapping,  so  that  once  a  connection  has  arrived  at  value  L  or 
<  ,  it  remains  there  permanently,  regardless  of  future  reinforcement. 

Consider  a  basic  training  sequence  of  m  stimuli,  . .  .  .  5^  , 
which  is  then  repeated  a  sufficient  number  of  times  to  "saturate"  the 
system,  i.e.  ,  to  drive  all  biased  values  to  their  limits.  If  the  value  of  a 
connection  is  w  after  the  first  ’>>  stimuli,  then  after  r  repetitions 
of  the  training  sequence,  the  value  will  be 
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min  (  L  ,  t‘-v )  if  'ir  >  ') 

min  {f.  ,  r  v)  if  ?/'  <  0 

0  if  -  0 

for  a  bounded  ^/-system.  An  unbounded  -system  will  have  the  same 
performance  after  repetitions  of  the  training  sequence  as  after  a  single 
repetition.  The  following  analysis  compares  the  performance  of  the 
"saturated"  bounded  .  -system  with  that  of  the  unbounded  -system 
at  the  end  of  the  training  sequence.  The  analysis  will  be  accurate  for  the 
assumption  of  a  large  range  between  !  and  P.  ,  so  that  after  the  first  m 
stimuli  none  of  the  values  have  reached  their  limits. 

Let  be  the  probability  that  k  -  -h  i  for  test  stim.ulus  5,  , 

for  the  unbounded  -system,  and  be  the  corresponding  probability 

for  the  bounded  .  -system.  Then  the  conditional  probability  '  “V  ) 
gives  the  performance  of  the  bounded  system  as  a  function  of  the  performance 
of  the  unbounded  system  (which  is  known  from  Chapter  7). 

Suppose  /.,  A-units  are  activated  by  the  test  stimulus,  5;^ 

Then  for  the  unbounded  system,  ''  where  is  the  cumulative 

distribution  function  defined  by  equation  (7.7)  and 

)N; 

J  ~ 

.  i 

where  f  (  i  •^.  )  -  expected  value  of  a  connection  activated  by  5^  .  and 

f  standard  deviation  of  such  a  connection.  The  bounded  o/  -system. 
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on  the  other  hand,  will  give  response  +1  if  the  proportion  of  the 

active  connections  having  value  L  is  greater  than  -  £/( L-j^).  If 

£=  -  L  ,  then  this  reduces  to  a  requirement  that  the  number  of  active 

copnections  having  value  L  should  be  greater  than  the  number  having  value  . 

The  connections  having  value  0  may  be  ignored.  As  with  the  unbounded 

system,  it  is  assumed  that  after  the  first  m  stimuli,  'tr'if  is  normally 

2 

distributed  with  expected  value  and  variance  o'  ■  This 

assumption  is  reasonable  if  the  range  of  ,  (L-/)  is  greater  than  2m 

and  m  is  fairly  large.  If  the  range  of  is  less  than  2m  ,  the  analysis 

can  be  considered  only  an  approximation,  which  becomes  increasingly  poor 
as  the  range  diminishes. 


Under  these  conditions,  in  the  bounded  system,  the  probability 


that  the  terminal  value  of  a  connection  is  L  is  equal  to  the  probability  that 


Z 


f,  is  positive  after  the  first  m  stimuli.  This  is  equal  to  §  ^ 

Since  T  is  a  cumulative  probability  distribution  it  is  a  one-to-one  function 
from  its  domain  to  its  range,  and  is  therefore  invertible.  Thus,  given  A, 


and  ,  the  probability  that  a  connection  activated  by  5^  goes  to 

value  L  will  be: 


f, 


(9.6) 


and  this  yields 


(l 


/  I 


Nn 

L 


'  N,, 
'  '/ 


y  h-  p.  I 

L  '  f-  ' 


-  tj 


(9.7) 


-230- 


Pg  FOR  UNBOUNDED  SYSTEM 


Figure  32  CONDITIONAL  ERROR  PROBABILITY  FOR  BOUNDED  o^-SYSTEM  vs.  ERROR 
PROBABILITY  FOR  UNBOUNDED  SYSTEM,  =  -L) 


231 


where  r 


r 


,  the  notation  [/?]  indicating  the  least  integer 


greater  than  or  equal  to  n  .  To  obtain  (P^'  \  Py  )  ,  the  expectation  of 

(9. 7)  with  respect  to  N  *  is  required.  For  reasonably  large  values  of  \  > 
\^x\  ^  ■  Substituting  Qx 

finally  yields : 


■  i  \  ‘  .  i - -  U  J  L 


y  '■ 


.  X  -  y 


(9.8) 


where 


P 


i^Pj) ' 


Q,N,  !\ 


L  f  U 


In  Figure  32,  the  conditional  probability  of  error  in  a  bounded 

-perceptron  is  shov/n  as  a  function  of  the  error  probability  (i-Px) 

^  1^1 

for  the  unbounded  system,  for  several  values  of 

taken  to  be  1/2.  Curves  of  this  function  for  cases  where  upper  and  lower 
limits  are  not  symmetric  can  be  found  in  Joseph,  Ref.  41  (Figures  10-14). 

9.1.4  Performance  of  Bounded  T -systems  in  S -controlled  Experiments 

The  analysis  in  the  preceding  section,  and  the  curves  shown 
in  Fig.  32,  can  be  applied  without  modification  to  bounded  ^T-perceptrons . 
The  true  -system,  however,  may  perform  somewhat  better  than  the 
-system,  since  not  all  values  can  "saturate”  independently.  If  more 
than  half  of  the  connections  have  a  positive  bias,  for  example,  not  all  of 
the  positively  biased  connections  can  go  to  the  limit  L  ,  since  this  would 


It  is  assumed  here  that  /  -•  0  ,  /  <■'  J 
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require  that  the  remaining  connections  take  on  values  less  than  S.  , 
in  order  to  satisfy  the  conservation  rule.  In  the  ^  -system,  therefore, 
we  would  expect  a  greater  number  of  connections  to  remain  at  inter¬ 
mediate  values,  rather  than  going  to  the  limits,  and  this  should  result  in 
a  "compromise"  between  the  performance  of  an  unbounded  and  a  bounded 
value  system.  An  exact  analysis  of  the  ^-system  has  not  been  carried  out. 

9.2  Analysis  of  Systems  with  Decaying  Values 


The  bounded  value  systems  have  two  disadvantages  relative  to 
the  "ideal"  unbounded  systems.  First,  they  permit  a  smaller  number  of 
memory  states,  and  second,  in  S-controlled  experiments  they  tend  to 
arrive  at  a  saturation  condition  in  which  their  performance  is  actually 
poorer  than  that  obtained  during  the  transient  learning  phase;  that  is, 
their  performance  curve  first  increases  to  a  maximum,  and  then  declines 
to  a  terminal  asymptote  as  the  system  saturates.  The  first  disadvantage  is 
not  serious,  if  the  range  of  is  reasonably  large.  The  second  maybe 

more  critical,  since  it  means  that  units  with  a  low  "utility"  for  a  given 
discrimination  are  pulling  as  much  weight  in  the  saturated  system  as  units 
with  high  utility  (as  m.easured  by  their  bias  ratios).  In  the  cross -coupled 
perceptrons  considered  in  Part  III,  this  latter  consideration  is  more 
salient  than  in  elementary  perceptrons. 

An  alternative  value -limiting  mechanism,  which  is  also  of 
interest  due  to  its  apparent  biological  plausibility,  is  obtained  by  allowing 
the  values  to  decay  e.xponentially  towards  a  resting  state  (generally  taken 
to  be  zero).  This  mechanism  is  relatively  free  from  the  difficulties 
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encountered  in  the  bounded  value  system.  In  this  model,  will 

continue  to  grow  in  the  direction  determined  by  the  bias  ratio  of  ai  ,  until 
the  expected  rate  of  reinforcement  is  exactly  balanced  by  the  rate  of  decay. 

At  this  point  a  dynamic  equilibrium  will  occur,  with  tending  to  fluctuate 

about  the  equilibrium  level.  This  means  that  connections  which  are  frequently 
reinforced,  in  a  consistent  direction,  will  attain  higher  values,  in  the  limit, 
than  infrequently  reinforced  connections,  or  connections  with  low  bias. 

Consider  an  cv.  -system  with  decaying  values.  Let  the  decay 
rate  be  equal  to  n  '(f  ^  .  Let  the  probabilities  of  positive  and  negative 

increments  to  be  p  and  ij  ,  as  in  the  analysis  of  bounded  -systems. 
As  long  as  T  is  small,  'w-f.  will  tend  to  approach  an  expected  asymptotic 
value  equal  to  '  -  y  c  .  At  this  point,  the  expected  rate  of  gain,  per  unit 
time,  is  fi  -  y  ,  and  the  expected  rate  of  loss  is  -  />  -  ry  .  If  the  value 

of  d'  is  very  small,  and  the  relaxation  time  correspondingly  long  relative  to 
the  expected  recurrence  rate  of  stimuli  from  the  environment,  this  system 
should  approach  as  a  limit  the  same  performance  as  the  unbounded  m  - 
system,  where  tends  to  grow  in  proportion  to  fi  -  ry  .  If  d'  is  some¬ 

what  larger,  however,  we  find  that  the  most  recent  stimuli  in  the  training 
sequence  will  have  the  most  pronounced  effect,  progressively  earlier  stimuli 
exerting  a  progressively  dimishing  effect  due  to  the  decay  of  .  Such  a 

perceptron  tends  to  forget  its  remote  experience  in  favor  of  more  recent 
expe  rience . 


The  dependence  of  these  systems  on  the  sequence  as  well  as 
the  identity  of  training  stimuli  makes  them  difficult  to  analyze  when  the 
relaxation  time,  or  "half-life"  of  'ir-f.  is  on  the  same  order  as,  or 
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shorter  than,  the  training  sequence.  If  (f  is  sufficiently  small,  per¬ 
formance  can  be  assumed  identical  to  the  unbounded  system.  An  absolute 
bound  on  the  maximum  attainable  magnitude  of  for  a  decaying  value 

perceptron  will  be  !  ^  (P  ,  corresponding  to  a  situation  in  which  Ci^  is 
reinforced  continuously  in  the  same  direction. 

9.3  Experiments  with  Decaying  Value  Perceptrons 


9.3.1  S-controlled  Discrimination  Experiments 

The  assential  features  of  S-controlled  discrimination  experi¬ 
ments  with  decaying  value  perceptrons  have  already  been  noted  in  the 
preceding  section.  If  the  decay  rate  is  small,  the  decaying  value  system 
approaches  the  performance  of  the  corresponding  "ideal"  or  unbounded 
system.  If  the  decay  rate  is  relatively  large,  forgetting  occurs,  which  is 
greatest  for  temporally  remote  events  and  negligible  for  recent  events  in 
the  training  sequence. 

9.3.Z  Error -correction  Experiments 


In  discrimination  experiments  with  error  corrective  rein¬ 
forcement,  a  more  complicated  situation  exists  than  in  the  case  of  S- 
.  controlled  experiments.  In  the  error  correction  system,  once  the 
perceptron  has  learned  a  task,  reinforcement  ceases,  and  the  values 
of  a  decaying  system  would  be  expected  to  decay  back  towards  zero. 

In  a  perfectly  noise-free  system,  the  values  would  all  decay  in  proportion 
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to  their  magnitudes,  however,  and  consequently  their  ratios  would  never 
change  as  long  as  no  further  reinforcement  was  applied.  Thus  once  per¬ 
fect  performance  is  achieved,  it  will  not  be  lost  as  long  as  the  values 
remain  above  the  noise -level  of  the  system,  despite  the  decay  effect. 

This  also  means  that  if  a  "run"  of  correct  responses  occurs  during 
training,  the  ratios  of  xr-^  for  different  connections  will  be  unaltered,  so  that 
the  next  error  to  occur  will  be  no  different  in  the  decaying  value  model  than 
in  the  unbounded  model.  Consequently,  the  application  of  reinforcement  just 
sufficient  to  correct  this  error  will  bring  the  ratios  of  the  values  to  precisely 
the  state  that  they  would  have  in  the  unbounded  model,  and  ability  to  achieve 
a  solution  to  a  classification  problem  should  be  unaffected,  in  principle..  In 
actuality,  however,  the  continuously  decaying  values  clearly  present  a 
problem.,  since  any  physical  system  will  ultimately  forget,  when  the  values 
become  small  enough  to  be  undetectable. 

A  variation  of  the  decaying  value  model  is  capable  of  eliminating 
the  problem  caused  by  the  diminution  of  the  values  in  an  unreinforced  system,. 
If  v;r  is  held  constant  so  long'as  no  reinforcement  signal  is  received 
from  the  reinforcement  control  system,  but  decays  exponentially  in  the 
presence  of  such  a  signal,  the  learning  ability  of  the  perceptron  will  still  be 
unaltered  (by  the  same  argument  as  above),  and  no  change  will  occur  once  the 
task  has  been  properly  learned.  This  means  that  the  increment  to  the  value 
of  ’ r  time  /  will  be 


(t)  I'-i-  (t)  -  ( t  )]  '  f  it  ) 
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where  it)  may  be  v-  e'  ,  -  £  ,  or  0. 


It  should  be  noted  that  in  the  error -correction  procedure,  the 
loss  of  temporally  remote  experience  with  large  values  of  (f  does  not 
occur,  in  an  ideally  functioning  (noise-free)  system.  Unlike  the  S-controlled 
system,  where  the  magnitude  of  new  reinforcements  remains  unchanged  as 
the  values  decay,  the  error  correction  procedure  will  require  smaller  or 
less  frequent  incremeiits  in  order  to  correct  ah  error,  and  earlier  experience 
tends  to  be  retained  about  as  well  as  in  the  unbounded,  or  non-decaying 
system;  A  loss  of  early  experience  does  occur,  in  such  systems,  but  it  is 
due  to  "writing  over"  earlier  memory  traces  with  more  recent  reinforcement, 
rather  than  to  a  passive  decay,  as  in  the  case  of  the  S-controlled  system. 

This  observation  would  seem  to  indicate  a  closer  correspondence  of  the 
error -corrective  system  with  what  is  known  of  forgetting  in  biological 
systems . 


The  mean  performance  curves  for  eight  simulated  perceptrons 
with  d  C  ,  0  -  ■  .1  ,  and  rf  ^  ■<jI  are  shown  in  Fig.  33.  Note  that 

for  these  actual  systems,  there  is  a  progressive  deterioration  of  performance 
as  the  decay  rate  is  increased. 
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CORRECTIVE  REINFORCEMENT 


9.3.3  R-controlled  Experiments 

The  most  interesting  experimental  results  obtained  to  date 
with  decaying  value  perceptrons  deal  with  the  performance  of  decaying 
/’-systems  in  R-controlled  experiments.  Experiment  9  has  been 
studied  most  extensively,  by  means  of  simulation  experiments  repre¬ 
senting  a  very  large,  or  infinite  ,  perceptron.  Unlike  the  previous 
experiments  (discussed  in  Section  8.3)  monopolar  reinforcement  was 
employed,  i.e.  ,  the  perceptron  was  reinforced  positively  for  t"  -  +  t  , 
and  was  not  reinforced  at  all  for  /•  ‘  -  /  .  The  system  was  further 

modified  by  assuming  a  slight  negative  quantity  to  be  added  to  A  iy--/.  (t) 
for  all  i  ;  that  is,  an  invariant  negative  reinforcement  component  was 
added  uniformly  to  all  connections ,  regardless  of  what  stimulus  occurred, 
and  regardless  of  the  activity  state  of  the  connection.  In  the  absence  of 
any  other  components,  this  would  cause  a  progressive  downward  drift  of 
all  until  they  achieved  an  equilibrium  with  the  decay  rate.  It  was 

assumed  that  this  negative  component  was  sufficient  to  add  a  quantity 
equal  to  -0.0001  to  the  set  of  connections  activated  by  a  single  stimulus. 
Thus,  apart  from  the  decay,  the  change  in  values  for  each  reinforcement 
could  be  expressed  by  the  equation: 

/,  ■  ' V,'  '  '  ■  ■  ( '  '  '  I 

The  effect  of  the  fixed  negative  component  in  these  experiments 
is  to  create  a  negative  generalization  from  the  first  stimulus  to  occur 
(say  a  horizontal  bar)  to  all  members  of  the  opposite  class  (vertical  bars) 
in  place  of  the  zero  generalization  which  would  otherwise  occur  with  a 
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-system.  The  result  is  that  after  having  seen  a  single  stimulus 
which  activates  a  positive  response,  all  members  of  the  opposite  class 
are  thenceforth  permanently  classified  in  the  negative  class,  as  no 
further  events  can  occur  which  will  make  one  of  them  positive.  If  the 
initial  stimulus  is  a  horizontal  bar,  then,  with  monopolar  reinforcement, 
no  vertical  bar  will  be  reinforced,  since  all  vertical  bars  evoke  a  -1 
response.  The  next  stimulus  which  can  possibly  be  reinforced  is,  in  fact, 
another  horizontal  bar  which  happens  to  be  close  enough  to  the  previous 
one  to  have  received  positive  generalization  from  the  first  reinforcement, 

i.e.  ,  the  first  or  second  neighbor  on  either  side.  The  result  is  a  gradual 

♦ 

growth  of  the  positive  stimulus  set,  by  accretion  of  near  neighbors  which 
have  received  positive  generalization  from  those  bars  already  classified 
as  "positive".  Thus,  having  started  out  by  randomly  placing  a  horizontal 
bar  in  the  positive  class,  the  system  has  no  choice  but  to  include  only 
horizontal  bars  in  the  positive  class,  and,  with  sufficient  time,  all 
horizontal  bars  are  so  classified. 

While  this  phenomenon  occurs  even  if  the  decay  rate  is  zero, 
it  is  markedly  accelerated  by  a  non-zero  decay  rate.  With  (/  <  ,  the 

perceptron  shows  a  high  degree  of  "rigidity"  in  its  early  classification,  in 
which  some  horizontal  bars  are  positive,  and  the  remainder  still  negative 
(as  in  Section  8.3).  This  is  due  to  the  continually  increasing  magnitude  of 
the  negative  values  evoked  by  the  "incorrectly"  classified  stimuli,  which 
must  be  overcome  in  order  to  change  their  classification.  Thus,  as  time 
progresses,  it  becomes  harder  and  harder  to  switch  each  additional  hori¬ 
zontal  bar  into  the  positive  class,  since  an  increasingly  large  number  of 
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"marginal"  positive  stimuli  must  be  reinforced  in  order  to  obtain  the 
required  amount  of  positive  generalization.  Moreover,  as  the  positive 
class  expands,  the  stimuli  which  are  centrally  located  within  the  "positive 
band"  all  contribute  further  negative  generalization  to  the  remaining 
stimuli,  rather  than  helping  to  make  them  positive.  These  combined  effects 
lead  to  a  conve.x,  negatively  accelerating  learning  curve,  as  illustrated  in 
Figure  33.  The  addition  of  a  non-zero  decay  rate  limits  the  negative  value 
which  must  be  overcome  in  order  to  change  the  classification  of  an 
"incorrect"  stimulus,  and  thus  makes  the  system  more  flexible. 

If  the  decay  rate  is  increased  progressively,  it  is  found  that 
there  is  an  optimum  at  about  cT  0.01.  If  the  decay  rate  is  increased 
further,  instability  occurs,  due  to  the  loss  of  stimuli  which  were  previously 
classified  correctly,  but  whose  positive  values  have  decayed  to  such  an 
extent  as  to  be  overcome  by  negative  generalization  from  other  stimuli. 
These  effects  are  shown  both  in  the  learning  curves  of  Fig.  34(a)  and  in 
Fig.  34(b),  which  shows  the  expected  learning  time  to  perfect  performance 
(i.e.,  perfect  dichotomiization  of  horizontal  and  vertical  bars),  obtained 
from  a  sample  of  10  runs. 

It  might  seem,  from  these  results, that  perceptrons  organized 
in  the  manner  indicated  could  be  expected  to  form  "meaningful"  classi¬ 
fications  of  stimuli,  on  some  basis  other  than  retinal  position.  Unfortu¬ 
nately,  the  results,  while  illuminating,  are  highly  restricted  in  generality. 
The  proposed  dynamics  are  too  contrived  to  be  biologically  plausible,  and 
it  is  found  that  in  any  environment  in  which  classes  of  stimuli  to  be 
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differentiated  permit  positive  generalization  between  members  of  different 
classes  (a  much  more  usual  situation)  the  mechanism  which  yields  good 
separation  in  the  above  example  breaks'down.  If  g-j  between  a  single 
horizontal  bar  and  any  of  the  vertical  bars  were  positive,  for  example, 
the  spread  of  generalization  would  not  stop  with  the  members  of  the 
horizontal  class,  in  the  above  case,  but  would  invade  the  opposite  class 
as  well.  If,  instead  of  4  by  20  horzontal  and  vertical  bars,  the  perceptron 
is  confronted  with  an  environment  consisting  of  the  twenty  horizontal  bars 
and  a  set  of  twenty  pairs  of  parallel  2  by  20  horizontal  bars,  separated  by 
a  space  of  3  units  on  the  retina,  the  perceptron  will  not  spontaneously  learn 
to  distinguish  single  bars  from  double  bars  (although  this  task  presents  no 
difficulty  in  an  S-controlled  experiment). 

Another  shortcoming  of  the  spontaneous  organization  phenomenon 
which  has  been  demonstrated  here  is  the  basically  unbiological  character  of 
the  learning  curves.  It  has  already  been  noted  that 'these  curves  are  convex, 
or  decelerating.  A  human  subject,  or  even  an  animal  subject,  confronted 
with  the  problem  of  distinguishing  horizontal  from  vertical  bars  might  make 
many  mistakes  initially,  but  would  soon  accelerate  his  learning  as  he  began 
to  generalize  to  new  stimuli.  If  he  had  a  hundred  bars,  in  different  retinal 
positions,  to  classify,  the  hundredth  bar  would  certainly  not  present  the 
almost  insurmountable  obstacle  that  it  represents  for  the  elementary  per¬ 
ceptron.  Thus  it  is  clear  that  the  most  sophisticated  generalization  phe¬ 
nomena  which  have  yet  been  found  in  elementary  perceptrons  are  still  far 
short  of  what  one  should  expect  from  an  adequate  brain  model,  if  biological 
standards  are  employed.  This  problem  will  be  re-examined  at  greater 
length  in  Part  III,  where  it  v/ill  be  seen  that  multi-layer  and  cross -coupled 
perceptrons  perform  such  tasks  in  a  much  more  suitable  fashion  than  those 
systems  which  have  been  considered  thus  far. 
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This  completes  the  presentation  of  elementary  perceptrons.  In 
the  following  chapters,  some  other  types  of  minimal  (S-A-R)  perceptrons 
will  be  considered,  but  it  will  be  seen  that  none  of  these  have  capabilities 
for  generalization  appreciably  beyond  those  discovered  in  the  elementary 
systems  . 


10. 


SIMPLE  PERCEPTRONS  WITH  NON-SIMPLE  A  AND  R-UNITS 


In  Chapter  4,  a  simple  perceptron  was  defined  as  one  which 
satisfies  the  following  five  conditions; 

1  .  There  is  a  single  R-unit,  with  a  connection  from  every  A-unit. 

2.  The  perceptron  is  series  coupled,  with  an  S-A-R  topology. 

3.  The  values  of  all  S-A  connections  are  invariant. 

4.  Transmission  times  of  all  connections  are  equal  (  T  generally 
taken  as  0). 

5.  All  signals  generated  by  S,  A,  and  R-units  are  functions  of 
the  algebraic  sum  of  input  signals  arriving  simultaneously 
at  the  unit . 

In  the  preceding  chapters,  we  have  considered  elementary 
perceptrons,  which  are  characterized  by  the  additional  constraints  that  all 
A  and  R-units  are  "simple”  units,  and  that  the  transmission  function  of  the 
connection  ci  ■  takes  the  form:  •  (t)  -  a  -  (t  -  T)  ir-  -  (t)  .A 

simple  A-unit  is  a  signal  generating  unit  which  emits  an  output  signal 

-  -h  I  if  the  algebraic  sum  of  the  input  signals,  o6 ;  ,  is  equal 
or  greater  than  the  threshold  0  ,  and  O  otherwise.  A  simple  R-unit 
emits  a  +1  signal  if  the  sum  of  its  input  signals  is  strictly  positive,  and  -1 
if  the  sum  of  its  inputs  is  strictly  negative.  In  this  chapter,  we  shall 
consider  the  properties  of  simple  perceptrons  in  which  these  contraints 
are  dropped.  This  will  include  a  brief  consideration  of  linear  networks 
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in  which  all  signals  are  transmitted  in  proportion  to  their  value;  the 
properties  of  perceptrons  with  linear  R-units  but  non-linear  A-units  will 
then  be  considered,  and  finally  the  question  of  optimum  transmission  func¬ 
tions  will  be  discussed.  In  later  chapters,  the  remaining  constraints  of 
simple  perceptrons  will  be  modified,  and  a  number  of  non-simple  systems 
will  be  analyzed. 

10.1  Completely  Linear  Perceptrons 

A  completely  linear  perceptron  is  one  in  which  all  signal  function; 

and  transmission  functions  are  linear,  i.e.  ,  the  output  of  unit  u ;  is  of  the 

form  U-:  -  .r  ■  v  •  ,  and  the  signal  transmitted  by  a  connection  c  •  ■  is 

* 

of  the  form  C;  j  -  u  i  'r;  ■  We  will  consider  linear  perceptrons  in 
environments  such  that  the  inputs  to  an  S-unit  are  either  1  or  0  (so  that  the 
conclusions  apply  equally  well  to  perceptrons  which  are  linear  everywhere 
except  in  the  S-units).  By  analogy  to  Section  5.4,  we  define  the  bias  ratio 
of  an  S-unit  as  n*'  n~  ,  where  n'''  is  the  number  of  positive  stimuli,  and 
n~  the  number  or  negative  stimuli  which  activate  the  S-unit.  For  such 
systems,  the  following  theorem  holds: 

THEOREM  1:  Given  a  completely  linear  perceptron,  a  stimulus  world, 

W  ,  and  a  classification  such  that  the  bias  ratio  of 
every  S-unit  is  equal  (and  non-zero),  no  solution  to  C(W) 
can  exist . 

PROOF:  Let 

A' 


index  of  any  stimulus  in  positive  class  (S^  +  j- 

=  index  of  any  stimulus  in  negative  class  . 

•  ,  f  fh 

-  index  of  a.  sensory  unit 

,  ,  ,  th 

signal  transmitted  from  the  .a  sensory  unit 

,  th  ,  .  . 

to  the  L  A-unit  in  response  to  stimulus  Si  • 


When  stimulus  5^  occurs,  unit  ai  transmits  a  signal  equal 
to  (4) to  the  R-unit,  where 

1^)  -  L  ^1/ 

The  total  signal,  ,  received  by  the  R-unit  from  is  therefore: 

=  Z  “  Z  Z 

t  L  ^ 

Since  every  signal  must  agree  in  sign  with  the  classification  ol 

for  a  solution  to  exist,  we  require  that  the  following  inequalities  be  satisfied: 


Z  Z  Z  Zi  >  0 

£  A.  4' 


(10.1) 


Z  ZZ_  Z;  (^-'J  '’7r  <  0 

I  A  4 


(10.2) 


But  it  has  been  stipulated  that  the  bias  ratio  of  each  S-point  is  equal  to  a 
constant,  r  >  0  .  This  means  that,  for  any  /  and  A.  , 


A- 


(r  >0) 


or, summing  over  S-units, 


A.  A  A  k~ 

Substituting  in  the  expres sions  (10.1)  and  (10.2)  we  get  the  contradiction 

Z  ^ 

I 
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which  proves  that  a  solution  cannot  exist. 


This  means  that  if  two  stimulus  patterns  are  placed  in  all 
possible  positions  on  a  retina,  the  resulting  classes  of  stimuli  cannot 
be  correctly  discriminated  by  a  linear  perceptron.  As  a  consequence, 
such  systems  are  relatively  uninteresting,  even  though  they  may  successfully 
discriminate  a  moderate  number  of  patterns  which  are  restricted  to  limited 
positions  on  the  retina.  In  all  systems  considered  from  here  on,  there  will 
be  at  least  one  set  of  non-linear  components  subsequent  to  the  S -units  in 
the  perceptron  network. 

10.2  Perceptrons  with  Continuous  R-units 


The  next  type  of  perceptron  to  be  considered  has  simple  A-units, 
but  continuous  R-units,  such  that  the  response  f'-  -  ,  with  A  an 

arbitrary  monotonic  function  of  /<.;  .  This  includes  the  case  of  linear 

R-units,  where  u;  •  An  important  theorem  which  is 

analogous  to  Theorem  4  of  Chapter  5  deals  with  the  ability  of  such  systems 
to  learn  arbitrary  response  functions  (Definition  27,  Chapter  4)  under  the 
error  correction  procedure.  A  response  function  assigns  an  arbitrary 
output  signal  (rather  than  just  +  1 )  to  every  stimulus  in  lA/'  .  We  first 
prove  the  following  Lemma: 

LEMMA  1:  Given  a  symmetric  positive  definite  or  positive  semidefinite 
matrix,  H  ,  and  any  vector  j  ,  then  ^ ~  ^  only  if 
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PROOF:  Since  H  is  positive  definite  or  semidefinite,  there  exists  a 

matrix  B  such  that  H  =  B'B  ■ 


0‘  -  (8},B.) 


8}  •  0  ^  0  -  -  Hj. 


THEOREM  2:  Given  a  simple  -perceptron  with  simple  A-units,  an 

R-unit  with  a  continuous  monotonic  sign -preserving 
signal  generating  function,  a  stimulus  world  W  (in  which 
each  stimulus  ultimately  reoccurs)  and  any  response 
function  l?(w)  for  which  a  solution  exists,  then  by 
means  of  the  error -corrective  reinforcement  procedure, 
the  given  response  function  can  always  be  approximated  in 
finite  time  by  an  output  vector  R{W)  -h  f.  ,  where  f 
is  a  vector  of  elements  (e ,  ,  6^  ,  ■  ■  ■  >  ip)  >  k/l  ’ 

where  <  '  may  be  an  arbitrarily  small  quantity  greater 
than  zero . 


PROOF:  The  following  proof  was  suggested  by  R.  D.  Joseph.  From 

Theorem  3  of  Chapter  5,  we  know  that  under  the  conditions  of  the  theorem, 
a  solution  v  to  the  equation  G^r  -  u  exists.  Suppose  the  system  is 
currently  in  the  state  x,  ,  represented  by  (Sy  x  •  From  the  definition 
of  the  G-matrix,  and  the  fact  that  every  stimulus  must  activate  at  least 
one  A-unit  for  a  solution  to  exist,  we  have 


/  ^ 


mm 


>  0 
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The  difference  between  the  solution  vector  u  and  the  present  state  X 
is  given  by 

<S(  z--  -  /./  )  ~  II  -  K 
Let  7  ■  -  </  =  9  and 


Then  ^  7  - 

We  wish  to  show  that  by  applying  an  error  correction  method  to  one 
component  at  a  time  of  the  vector  o  ,  Jt'-  must  ultimately  go  to  a  point 
within  the  £'  cube  about  0.  (The  method  will  apply  a  correction  of  the 

proper  size  until  a  response  i'^  -  is  obtained.)  We  know  that  (j- 

j 

Therefore,  for  the  difference,  ,  we  have 


//.--• 


1  ■ 
j . 


7  •  •  y 


■F 

Since  1  is  non -negative  definite,  we  know  that  •,  ,  “  yyy';  , 

and  from  Lemma  1  we  know  that  if  P  >  .  Therefore,  if 

>  ;y  decreases  as  a  result  of  decreasing  '7^-  ,  F-  decreases;  also, 
if  J/  --  ■  ^  increases  by  increasing  ,  A  decreases  (see  Proof  of 

Theorem  4,  Chapter  5).  To  prove  the  theorem,  it  is  sufficient  to  show 
that  this  implies  that  /■/  must  ultimately  enter  the  ”  cube  about  zero. 


Let 


initial  value  of  /<  ■  at  start  of  a  correction  step 
initial  value  of  ^  .  at  start  of  a  correction  step 
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Then  for  the  correction,  we  have 


/  -2 
9;; 

Therefore,  F  <  -  j/r;'  <  -  e" 

Hence,  there  can  be  only  a  finite  number  of  corrections,  since  F  t  0  , 

and  the  vector  /jy  =  u-x  must  converge  to  a  point  within  the  s'  cube 

If- 

about  zero.  But  u.  is  the  input  to  the  R-unit.  Since  r  (u)  is  continuous  , 
there  exists  an  s"  such  that  \r' ( u  +  (f)  ~  ^  if  {(F  \  -  s'  ■  There¬ 

fore  the  response  function  coverges  together  with  the  vector  xcr  .  Q.E.D. 


The  following  Lemma  and  Corollaries  establish  that  the  various 
weaker  forms  of  correction  procedures  are  also  capable  of  yielding  a 
solution  to  R (W)  . 

LEMMA  2:  For  the  same  conditions  as  Theorem  2,  given  that  a 

solution  exists,  the  set  of  all  solutions  forms  a  hyperplane 
of  dimension  equal  to  the  nullity  of  G  . 

PROOF:  Let  Gx  =u  be  a  solution.  Of  necessity  u,-  =  .  Let 

6'y  =  be  another  solution .  Then  0(x-y)  =  0  ,  consequently  x.  -  y 

is  in  the  null  space  of  G  .  Conversely,  il  ■^-x  is  in  the  null  space  of  (5, 
then  G  (o  ~  x)  =  0  .  Therefore ,  G  o  =  u  ,  so  that  j  is  a  solution.  Q. E .  D . 

COR^OLLARY  1:  For  the  conditions  of  Theorem  2,  and  a  phase  space  which 
is  unbounded  in  all  dimensions,  the  probability  of  conver¬ 
gence  to  an  arbitrarily  close  approximation  to  RiW)  by 
means  of  a  random-sign  correction  procedure  or  a  random- 
perturbation  correction  procedure  may  be  less  than  1. 

PROOF;  The  random-sign  and  random -perturbation  procedures  were 
defined  in  Section  5.6.  R  (W)  is  taken  to  be  any  response  function, 
obtainable  by  an  R-unit  with  a  monotonic  signal  generating  function.  For 
convergence  to  occur,  it  would  be  necessary  that  a  series  of  steps  by 
increments  of  fixed  magnitude,  |y^|  ,  but  of  random  sign,  should  carry 
the  system  from  its  initial  state  to  an  arbitrarily  small  distance,  f 
from  its  required  state.  From  Lemma  2,  the  solution  states  form  a  hyper- 
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plane  of  dimension  equal  to  the  nullity  of  G,  which  has  zero  measure  over 
the  phase  space  of  the  system.  But  a  random  walk  of  the  type  described 
may  carry  the  system  arbitrarily  far  from  its  starting  point,  in  a  random 
direction,  and  the  probability  that  a  vertex  of  this  path  will  fall  within  a 
distance  f  of  the  solution  hyperplane  may  be  less  than  unity. 

COROLLARY  2:  Given  the  conditions  of  Theorem  2,  and  a  phase  space 
bounded  in  all  dimensions,  then  (given  that  a  solution  to 

exists  in  this  bounded  space)  the  response  function 
can  always  be  approximated  by  means  of  the  random-sign 
correction  procedure,  the  system  converging  in  finite  time 
to  an  approximation  0(11/'  '■  f  ,  f  a  vector,  where 
<  r  for  arbitrarily  small  ^  >  U 

PROOF;  Since  the  phase  space  is  finite,  the  set  of  solution  points  within 
the  bounds  defined  above  has  positive  measure.  The  random-sign  correction 
procedure  cannot  carry  any  of  the  A-unit  outputs  beyond  the  limit  set  for  its 
value;  therefore,  if  the  values  approach  their  limit  in  any  direction,  a  ran¬ 
dom  walk  in  the  opposite  direction  will  follow.  This  procedure  will 
ultimately  take  the  representative  point  of  the  system  into  every 'set  with 
positive  measure,  provided  r'^  is  sufficiently  small.  Consequently,  a 
solution  within  the  bounds  stated  by  the  theorem  will  be  obtained  in  finite 
time  . 

COROLLARY  3:  Given  the  same  conditions  as  Corollary  2,  the 

response  function  can  always  be  approximated  by 
the  random-perturbation  correction  procedure,  the 
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system  converging  in  finite  time  to  an  approximation 
R{W)  +  e  ,6  having  elements  of  magnitude  |£j|  ^  \y^\ 
if  the  reinforcement  is  quantized,  or  \ei  \  ^  >  0  , 

if  is  chosen  from  a  continuous  distribution  around 
zero . 


PROOF:  The  proof  follows  the  same  line  as  that  of  Corollary  2.  Since 
each  connection  can  be  set  to  an  independent  value,  in  the  quantized  case 
the  total  error  over  the  set  of  all  connections  need  not  be  greater  than  , 

while  in  the  continuous  case  it  maybe  made  arbitrarily  small. 

Theorem  2  and  its  corollaries  indicate  that  it  is  possible  to 
teach  a  simple  perceptron  to  produce  responses  which  are  proportional  to 
some  metric  feature  of  the  input  stimuli,  such  as  their  size,  or  coordinates 
of  their  center  of  gravity  on  the  retina.  In  the  latter  case,  the  output  of 
such  an  R-unit  can  be  fed  back  to  the  optical  system  to  control  the  centering 
of  a  stimulus  in  the  field. 


10.3  Perceptrons  with  Non-linear  Transmission  Functions 


In  all  perceptrons  considered  thus  far,  the  transmission 
functions  of  connections  from  A -units  to  the  R-unit  have  been  of  the  form 
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We  will  now  consider  functions  of  the  more  general  form; 

(■r  -  >  T^.rJ 

Where  time  is  not  specified,  this  is  understood  to  mean 

r-,.  (t)  -  -/  (a  -  (t  -  r),  T/;y  (t)) 

Since  is  a  function  of  the  input  signal,  ry,  ^  ^  the  transmission 

function  can  be  written  in  a  still  more  general  form  (allowing  for  various  types 
of  signal -gene rating  functions  in  the  A-units), 

■C  •  ,  ■  r  ■  ,  ■;/-■  I 

K  f  •  If' 

This  form  will  be  employed  in  the  following  theorems, 

THEOREM  3;  Given  a  simple  perceptron  with  a  simple  R-unit,  and  with 
transmission  functions  for  all  A-R  connections  of  the  form 
>  where  -  is  any  function,  and  given  the 
existence  of  a  solution  to  a  classification  function  ■  (  // 
for  this  perceptron,  then  if  p{T)  is  any  polynomial  of 
odd  degree  in  7  ,  there  also  exists  a  solution  if  the 

transmission  function  is  changed  to  f  \7^,  r)  • 

PROOF:  A  polynomial  of  odd  degree  can  assume  all  possible  values. 

Therefore  if  /  is  the  original  value  of  the  connection  ,  there 

exists  a  solution  to  yielding  a  new  value,  x  ,  for  the 

connection  c ^  which  v/ill  cause  it  to  transmit  an  identical  signal  under 
the  new  transmission  function. 
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THEOREM  4; 


Given  the  perceptron  of  Theorem  3 ,  if  a  solution  exists 
for  some  transmission  function  f  (oc^)  iTif.  ,  a  solution 
does  not  necessarily  exist  for  the  transmission  function 
g((x-)  vif.  ,  9  ^  • 


PROOF;  Suppose  the  number  of  A-units  is  equal  to  the  number  of  stimuli 
in  W  .  Let  H  --  matrix  of  elements  b—  representing  the  value  of  the 
function  (J))  which  is  the  coefficient  of  for  stimulus  5y  • 

Then  for  a  solution  to  exist,  there  must  be  some  vector  1/  and  some 
vector  U  in  the  orthant  required  by  C(  W)  ,  such  that  B  V  =  U  .  But  if  B 
is  singular,  there  must  be  some  f  (W)  for  which  no  solution  exists.  This 
can  be  demonstrated  by  noting  that  each  C 'bV  requires  a  solution  vector  in 
a  different  orthant,  the  set  of  all  "  iV J  requiring  solutions  in  every  possible 
orthant.  But  if  6  is  singular,  it  maps  the  entire  space  into  a  hyperplane, 
and  this  plane  must  fail  to  intersect  certain  orthants  .  Consequently,  the 
functions  'ivj  which  are  represented  by  vectors  in  those  orthants  have  no 
solution.  Now  consider  the  following  cases: 


\ 


CASE  1:  For  the  transmission  function  zr  ,  let  the  matrix 

1  1  1 

1  Z  3 

Z  3  4 


I 


8  -- 


This  is  singular,  and  consequently  there  are  some  insoluble  classifications, 
Now  change  the  transmission  function  to  -7^  ,  yielding  / 1  1  1 

4  9 


1 

4 


4 

9 


16 
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This  matrix  is  non-singular,  so  that  with  the  non-linear  transfer  function, 
all  classifications  are  soluble. 


CASE  Z:  In  this  case  it  is  shown,  conversely,  that  there  may  be  situations 


in  which  a  linear  transmission  function  will  yield  solutions  which  are  un¬ 
obtainable  with  a  particular  non-linear  function.  Let  the  transmission 
,  3  5  8  '\ 

function  be  ■  ,  with  the  matrix  h  ■  ;  .  This  matrix  is  non¬ 


singular,  so  there  is  a  solution  for  every 

/  9 

sion  function  be  v  .  Then  ~  I  .  , 


implying  that  there  is  some  h,.i  '  with  no  solution. 
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THEOREM  5:  Given  a  simple  perceptron  with  A-R  connections  which 

differ  in  their  transmission  functions  (or  with  uniform 
transmission  functions  but  non-simple  A -units)  a  response 
function  Ait-v,'  may  have  a  solution  which  is  unattainable  by 
either  the  error  correction  procedure  or  the  random-sign 
correction  procedure. 


PROOF:  Consider  a  perceptron  with  a  single  sensory  unit  and  two  A-units. 

Let  the  R-unit  be  a  linear  amplifier  with  gain  of  1  .  Let  the  sensory  unit 
emit  signals  0,  1,  or  Z  depending  upon  the  intensity  of  the  stimulus.  The 
required  response  function  is  •  W  '  -  !  ,  t- I .  -l)  corresponding  to  a  null 

stimiulus,  a  low -intensity  stimulus,  and  a  high -intensity  stimulus,  respectively. 
Let  the  transmission  function  of  be  >  o:.  |  7r  ,  and  the  transmission  function 

of  be  //■  .  The  response  function  P(W)  then  has  a  solution  if  we 
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set  ’V'lf,  =  2.5  and  -  -  55  .  But  this  is  the  only  possible  solution, 

and  is  unattainable  by  the  error  correction  or  random-sign  procedures,  since 
both  connections  are  always  activated  together  and  consequently  must  always 
be  equal  in  value  under  these  procedures  (assuming  that  their  initial  values 
are  equal).  This  example  is  sufficient  to  prove  the  theorem  for  the  case  of 
non-uniform  transmission  functions. 


For  the  second  case,  in  v/liich  all  transmission  functions  are 
uniform,  but  the  perceptron  has  non-simple  A-units,  consider  the  following 
perceptron : 


R 


The  values  of  all  S-A  connections  are  +1,  and  the  A-units  are  both  linear, 
with  transmission  function  cy.zr  .  Let  the  environment  consist  of  the  two 
stimuli  C  ^  -  -a  I  and  \  ^  ,  A  2)  •  Then  a  solution  exists  to 

the  response  function  >  ,  namely  -  -f-  3  ,  7/"^^.  =  -2. 

However,  the  error -correction  or  random-sign  correction  procedures  will 
not  work,  since  both  A-units  are  always  active  (where  ''active''means  that 
they  emit  a  non-zero  signal).  Note  that  a  solution  also  exists  to  the 
classification  ^ ~  0  for  this  perceptron,  and  that  this  is  also 


unattainable  by  the  methods  indicated. 


The  sixth  theorem  was  proposed  by  R.  D.  Joseph. 

THEOREM  6:  Given  a  simple  perceptron  with  any  mixture  of  transmission 

functions  nr-  )  for  the  connections  /:  ■  ,  and 

a  response  function  f?{W)  for  which  a  solution  exists;  then 
there  exists  some  transmission  function  g  (cy.j'ir)  which 
is  uniform  for  all  connections,  such  that  a  solution  to  R{W) 
exists . 


PROOF:  Let  /■  ''oy ,  ~  signal  from  unit  O-j  when  stimulus  5; 

occurs.  Then  we  can  fit  a  polynomial 

n  -  t 

for  each  stimulus  5;  •  The  coefficients,  ,  (which  depend  on  the 

A-unit,  o )  can  be  replaced  by  polynomials 

N  -t 

.  / 

( i )  ^  J 

f.  0 


Thus  we  have,  for  all  values  of  J  , 


f  ■  {o(.  ■  ( t  j  ,  ■//  • ,  ) 
.1  j  ’  j ' 


n  -  I  /V  ,-,  ■  /  n  A 

"  Z.  Z  ‘--n  j '  =  9 

A  =  0 


which  satisfies  the  conditions  required  by  the  theorem  for  g  (ry ,  'ir) 

if  we  set  '!-  ■  ;  . 

jr  J 
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It  should  be  noted  that  this  theorem  applies  only  to  a  given  response 
function  for  which  a  solution  exists;  if  a  different  response  function  also  has 
a  solution,  then  there  will  again  be  a  uniform  transmission  function  for  all 
A -units  which  will  solve  the  problem,  but  this  transmission  function  may 
differ  from  the  one  obtained  for  the  original  response  function. 

We  have  seen  in  Theorem  5  that  if  the  connections  differ  in 
transmission  functions,  or  the  A-units  differ  in  signal  generating  functions, 
response  functions  may  have  solutions  which  cannot  be  obtained  by  the  more 
systematic  correction  procedures.  The  following  theorem  proves  that  in 
this  case  the  weakest  of  the  correction  procedures  (the  random  perturbation 
method)  can  still  be  used  successfully. 

THEOREM  7;  Given  a  simple  perceptron  with  an  R-unit  which  is  either 
simple  or  has  a  continuous  signal  generating  function, 
and  with  any  combination  of  transmission  functions  from 
its  A-units  (all  continuous  functions  of  ,  equal  to 

zero  if  0  ),  and  given  a  bounded  phase  space 

within  which  a  solution  exists  for  P(w)  ;  then,  if  each 
stimulus  in  W  ultimately  reoccurs,  an  approximate 
solution  i?(VV)  y-  c  is  always  attainable  in  finite  time 
by  the  random-perturbation  correction  procedure. 

PROOF:  For  an  R-unit  of  the  specified  type,  and  a  bounded  phase  space, 

the  solution  set  has  positive  measure,  over  the  region  defined  by  l^(w)  -f  £ 
(where  £  consists  of  arbitrarily  small  elements,  6-  -  €  )  .  To  achieve 

an  approximate  solution  within  this  set,  it  is  only  necessary  to  adjust  the 
values  of  the  active  A-units  for  each  stimulus.  Since,  under  the  random 
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perturbation  procedure,  each  active  connection  will  independently  tend  to 
assume  a  value  in  every  admissible  range  with  positive  measure,  the  active 
set  of  connections  as  a  whole  will  ultimately  attain  a  value  configuration 
within  the  solution  set. 

10.4  Optimum  Transmission  Functions 

The  general  conclusions  of  the  preceding  pages  are  that  while  a 
completely  linear  perceptron  does  not  work  satisfactorily,  there  are  many 
possible  transmission  functions  which  seem  to  work  quite  well.  For  many 
of  these,  there  is  no  choice  to  be  made  from  the  standpoint  of  ability  to 
achieve  a  solution,  for  they  all  seem  to  be  capable  of  solving  the  same 
problems  equally  well.  From  the  standpoint  of  efficiency  of  discrimination 
and  speed  of  learning,  however,  the  various  transmission  functions  might 
differ  considerably  from  one  another.  In  this  section,  making  use  of  an 
analysis  due  to  Joseph,  it  will  be  shown  that  with  some  fairly  weak  constraints 
on  the  system  under  consideration,  an  optimum  transmission  function  exists, 
and  that  this  takes  the  form  of  a  quadratic  function  of  rather  than  a 

linear  function . 

The  constraints  on  the  system  to  be  analyzed  are  as  follows: 

1  .  The  analysis  deals  with  S-controlled  discrimination 
experiments,  with  a  fixed  training  sequence. 

Z.  The  conditional  distribution  of  ir;^,  for  connections  activated 
by  a  test  stimulus  of  the  positive  class,  5,-  ,  is  assumed  to  be  independent 

of  the  choice  of  .  Similarly,  the  distribution  of  for  active 
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connections  is  assumed  to  be  independent  of  the  exact  choice  of  5^  when  the 
test  stimulus  is  selected  from  the  negative  class, 

3.  It  is  assumed  that  the  conditional  distribution  of  for 

the  connections  activated  by  is  a  normal  distribution,  and  that  either 
the  distributions  are  different  or  the  probabilities  Q;  are  different,  for 
test  stimuli  in  the  positive  and  negative  classes.  These  constraints  will 
generally  be  met  satisfactorily  if  the  positive  class  consits  of  all  possible 
positions  on  the  retina  of  a  large  stimulus,  and  the  negative  class  consists 
of  all  possible  positions  of  a  small  stimulus.  The  main  requirement  is  one 
of  equivalence  of  stimuli  within  each  class,  and  dissimilarity  between  classes, 
with  respect  to  the  distribution  or  number  of  signals  transmitted  from  A-units 
to  the  R-unit, 


The  discrimination  problem  can  be  stated  as  one  of  testing  a 
hypothesis  about  the  test  stimulus,  .  The  response  unit  is  required 

to  test  the  hypothesis  that  5^  is  a  member  of  the  positive  class  against 
the  possibility  that  it  is  a  member  of  the  negative  class.  If  the  test  stimulus 
is  a  member  of  the  positive  class,  the  output  of  an  A-unit  (subject  to  the 
above  assumptions  about  the  system  being  analyzed)  will  have  the  distribution 


0  with  probability  /  -  0^  (+) 


!  cA  with  density  function 


)2TT 


exp  ■ 


I  ^cr, 


T  I 

{+)  J 


(10,3) 
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where  cr^^j  ,  and  are  the  parameters  characterizing 

stimuli  of  the  positive  class.  Similarly,  if  the  test  stimulus  is  a  member  of 
the  negative  class,  the  output  of  an  A -unit  will  have  the  distribution 


0  with  probability 


?/■  with  density  function  exp. 


I  , 


(10.4) 


where  Qy(~),  3re  the  parameters  characterizing 

stimuli  of  the  negative  class.  Thus,  the  problem  can  be  restated  as  one  of 
testing  whether  the  output  of  an  A -unit  has  the  distribution  (10.3)  or  the 
distribution  (10.4). 


There  is  thus  a  simple  hypothesis  (dealing  with  a  single  distribution) 
and  a  simple  alternative.  As  Joseph  has  observed,  under  these  conditions, 
for  any  significance  level,  the  likelihood  ratio  test  is  most  powerful.  In 
performing  this  test,  we  would  make  N  independent  observations  of  py' 
(corresponding  to  a  sample  of  N  A-units  with  independent  origin  point 
configurations),  and  obtain  the  likelihood  ratio: 


/  - 


/  -  Jy  i  ■* 

I-Qx  (-) 


-N 


f  Q X  {  + 

Q X,  ) 


N 


exp .  <  -  - —  5 


’1 


/ 
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where  N  is  the  number  of  active  A-units,  and  the  summation  on  i  is  over 
active  units  only.  If  L  is  greater  than  a  preassigned  constant  ,  we 
accept  the  hypothesis  that  5y  is  a  member  of  the  positive  class;  if  L 

A. 

is  less  than  ,  we  accept  the  alternative,  that  is  a  member  of 

the  negative  class.  The  constant  Lq  ,  corresponding  to  the  threshold 
of  the  R-unit  in  a  perceptron  employing  this  procedure,  determines  the 
power  and  significance  of  the  test.  (The  "significance"  is  measured  by 
the  probability  of  erroneously  rejecting  a  positive  stimulus,  and  the  "power" 
is  the  probability  of  correctly  classifying  a  negative  stimulus.)  In  logarithmic 
form,  the  condition  L  -  L„  becomes 


)  (*l 


Qx(-)(i-Qx 


(-) 


f '  -  Or  (-}) 


N 


I- 


'y- 


1 


Thus,  the  required  test  is  effectively  performed  if  the  perceptron  is  designed 


with  R-units  having  a  threshold  Z.^  ^  A/, 

functions  from  A  to  R-units  are  of  the  form 


and  the  transmission 


/ (-y-j-y/-)  -  j 


0  'J  w.  <  Q 


2cr. 


to 


ir 


2  //V) 


uc 


O', 


xr  y  — 
2<t, 


M 


(-) 


(-){!- Qx 


f  ex. 


(y; 


The  actual  savings  that  might  be  obtained  by  the  use  of  such  a 
quadratic  form  have  not  been  investigated  numerically.  In  practise,  they 
are  probably  slight.  A  further  discussion  of  the  optimization  problem,  inclu¬ 
ding  the  optimization  of  the  upper  and  lower  bounds  in  a  bounded  value  per- 

>!< 

ceptron,  can  be  found  in  Joseph,  Ref.  41. 


Prof.  A.  Gamba,  in  a  related  paper,  has  observed  that  not  only  the  trans¬ 
mission  functions  but  the  reinforcement  rule  might  be  profitably  modified 
in  order  to  optimize  the  overall  decision  function  of  the  system  (Ref.  Z3). 


-264- 


1  1 .  PERCEPTRONS  WITH  DISTRIBUTED  TRANSMISSION  TIMES 


One  of  the  requirements  for  a  simple  perceptron  is  that  the 

transmission  time,  T-  -  ,  should  be  equal  for  all  connections,  ■ 

In  this  chapter,  we  consider  the  consequences  of  allowing  a  distribution 

of  transmission  times.  It  is  obvious  that  under  these  conditions  the  set  of 

A -units  active  at  time  t  will  depend  not  on  the  single  momentary  stimulus 

occurring  at  time  t  -  V  ,  but  rather  on  the  entire  sequence  of  stimuli 

occurring  between  t  -  T  and  f.  ~  T  .  We  shall  first  consider  the  cases 

min  max 

of  binomial  and  Poisson  models  where  T;  •  is  distributed  with  a  discrete 

spectrum,  always  being  an  integer  equal  to  or  greater  than  1.  We 

shall  then  consider  the  case  of  a  continuous  Gaussian  distribution  for  T-  ■ 

^  J 


11.1  Binomial  Models  with  Discrete  Spectrum  of  '('-j 


For  the  binomial  case,  we  shall  consider  only  the  case  where 
each  A-unit  receives  a  fixed  number  of  connections  of  each  type  (excitatory 
and  inhibitory)  with  T  •  =1,  and  a  fixed  number  with  T^j  =  Z. 

Specifically,  the  parameters  of  an  A-unit  are: 

0  -  threshold  (defined  as  usual) 

f !  -  number  of  excitatory  connections  with  Vij  =  / 

=  number  of  inhibitory  connections  with  -  I 

=  number  of  excitatoryconnections  with  ?'•  j  -2 
tj ^  -  number  of  inhibitory  connections  with 
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Models  with  a  greater  number  of  possible  values  for  T-j  can  be  analyzed 
by  extensions  of  the  method  applied  here.  The  object  of  the  analysis  is  to  find 
Q-  and  Q-  •  at  time  f  ,  as  functions  of  the  two-step  sequences  of  stimuli: 


J;  -  Sj(t-l) 


The  notation  5;'  will  be  used  consistently  to  denote  the  stimulus  preceding 
the  terminal  stimulus  in  sequence  J;  .  Similarly,  in  sequences  of  more 
than  two  stimuli,  S-"  will  be  used  to  denote  the  third  stimulus  from  the 
end,  etc.  In  the  present  model,  sequences  of  length  greater  than  2  need  not 
be  considered.  If  it  is  assumed  that  A  to  R-unit  connections  all  have  equal 
transmission  times,  the  analysis  of  performance  in  terms  of  the  Q-functions 
will  be  identical  with  the  analysis  for  simple  perceptrons,  the  important 
difference  being  that  the  perceptron  is  now  learning  to  recognize  sequences 
of  stimuli,  rather  than  isolated  momentary  events. 


The  total  input  signal  to  an  A-unit  at  time  t  ,  o<:(tJ  ,  is  now 
a  sum  of  four  components,  namely, 

Fi  E 2  ^ I  ~  ^ 2 

where  -  number  of  excitatory  connections  with  T  =  1,  having  origins 

active  at  t  I 

Ij  =  number  of  inhibitory  connections  with  T  -  1,  having  origins 
active  at  t  / 

E^  =  number  of  excitatory  connections  with  T  =  2,  having  origins 
active  3.i  t  2 
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1  ^  -  number  of  inhibitory  connections  with  T  -  'L, 
having  origins  active  at  t  -  2  . 

As  usual,  o  ■  ('/•,'  =  /  if  (t)  ^  0  ,  and  0  otherwise,  is  then 

given  by  the  following  equation,  which  is  analogous  to  (6.1): 


0; 


(11.1) 


where  the  probabilities 


and  /-’y^  are  defined  as  in  (6.2),  with 


the  substitution  of  the  appropriate  parameters,  and  the  stimulus  measures 
in  the  expressions  for  and  R  and  R->  in  the  expressions  for 


and  /■ 


In  a  similar  manner, the  expression  for  can  be  obtained  by 

the  extension  of  the  treatment  employed  in  Equations  6.5  and  6.6.  However, 
there  are  now  eight  components  to  be  considered  for  a,  for  each  stimulus 
sequence.  Specifically, 

■  R)  '  ■  ^  r  ,  -1.  -  7.  -  / 

•  i  K  I  (. 

<'7 )  -  L  ■  t  ^  ^  r'  ~  ^ ~  ^  ' 

V  «  J  •  J  J 

where  t  ■  and  1-  are  defined,  as  before,  as  the  excitatory  and  inhibitory 
components  originating  from  the  set  of  retinal  points  situated  in  and 

not  in  7  •  ,  ■  and  1-'  are  the  corresponding  components  originating 

from  the  set  of  retinal  points  situated  in  5;'  but  not  in  Sy  ,  and  hj  , 

T-.  and  •'  are  similarly  defined.  Likewise,  t  .  and  I  .  are  the 
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excitatory  and  inhibitory  components  coming  from  the  retinal  set  common 
to  S;  and  sj  ,  and  and  I^,  ,  are  the  components  from  the  set 

common  to  5;'  and  5;'  .  Thus  we  have  the  equation 


0- 


ioti L  }  ^  fy 

(j)  ^  e 


F,:')  PyCl;;  Ij 


V) 


(11.2) 


The  required  multinomial  probabilities  being  computed  from  equations  (6.6) 
with  an  obvious  extension  of  the  above  notation  to  the  quantities  A-  ,  Aj  , 
C  ,  A->  ,  A  ,  and  C '  ■ 

^  J 

Since  the  Poisson  model  is  much  easier  to  compute,  and  has 
properties  which  are  similar  in  all  essentials  to  the  binomial  model,  no 
numerical  examples  are  given  for  the  binomial  model,  but  examples  for  the 
Poisson  model  can  be  found  in  the  following  section. 


11.2  Poisson  Models  with  Discrete  .Spectrum  of  2\: 


The  Poisson  model  to  be  considered  again  has  two  values  of  V , 
namely  T  -  1  and  T  -  2,  the  parameters  Xf  ,  ,  and 

being  defined  analogously  to  x  and  y  in  the  Poisson  model  considered  in 
Chapter  6.  The  equations  for  and  (^-j  can,  of  course,  be  developed 
by  extension  of  the  equations  of  Chapter  6,  as  has  just  been  done  for  the 
binomial  model,  A  considerably  simpler  approach  is  possible  in  the  Poisson 
model,  however,  if  the  corresponding  stimulus  areas  at  times  t~l  and  t-2 
are  also  equal,  i.e,.  A-  ^  A  ■>  ,  Aj  ~  Ay  ,  and  C  -  C^.  In  this 


-268- 


case,  the  previous  equations  (6.1,  6.3,  6.5,  and  6. 7)  hold  v/ithout  modification, 
except  that  X  -  +  and  y  =  [/i'^  generally,  the  previous 

equations  can  always  be  employed  by  making  the  appropriate  substitutions: 

v'-  -  T ,  r.  /  ,7.  ' 

>  ft  y  i 

■  i.  ■  y^k;' 

I  J  -  J 

>’  i  ■  -  >'  /'I _  '  J:.,  A  ' 

.(  .-1  •  -  r  ,  A  ■  ^  y  .  A  ■' 

j  '  J  y-  j 


and  similarly,  for  the  inhibitory  components .  If  x. ^  and  , 

the  equations  for  Q-  and  Q-j  again  become  identical  with  the  equations 
of  Chapter  6  where  k'-  ’  {k’-  t-  /  ,  A;  =  -  (A-^  +  A-' )  ,  etc. 

By  an  obvious  extension  to  a  spectrum  with  three  or  more  values  of  T  , 
where  -  ,  <  ...  x ,  and  ...  7^  ,  we  can  apply 

the  same  equations,  substituting  the  parameters 

(  i-  -  ■  s  -  /  k;  '  ■  ...  I 

'(  /  , 

and  similarly  for  ^  and 
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As  an  example  of  the  performance  of  such  a  system,  consider 
a  Poisson  model  perceptron  with  an  expected  value  of  6  excitatory  and  6 
inhibitory  connections  to  each  A -unit  and  0  =2,  Let  the  environment 

consits  of  a  set  of  4  by  20  vertical  bars,  such  as  were  employed  in  the 
experiments  of  the  preceding  chapters.  The  object  will  be  to  discriminate 
a  bar  arriving  at  a  certain  fixed  location  by  movem.ent  from  the  left  from  a 
bar  which  arrives  at  the  same  location  by  movements  from  the  right.  Clearly, 
if  a  single  value  of  T- ■  is  permitted,  this  task  is  impossible.  Consider 
first  the  case  in  which  half  of  the  excitatory  and  half  of  the  inhibitory  connections 
have  T  -  1  and  the  remaining  half  have  t  =  2,  so  that  ~  V/  ^  • 

Let  sequence  J-  denote  S  i,(t  -  2 )  ^  and  Jj  denote 

Sy(t:-I)),  where  represent  successive 

adjacent  positions  of  the  vertical  bar  on  the  retina.  Then  0^-  =  Q-  -  .153, 

and  Q-j  .094.  Next,  suppose  one  third  of  the  excitatory  connections 
and  one  third  of  the  inhibitory  connections  have  delays  T  =  3  ,  one  third 

have  T  -  2  ,  and  one  third  have  r  =  /  ,  so  that  ~ ^  ~  ^ ~  ^  ^  ~  \  ^ 2  ~  ^3  ~  ^ ' 

In  this  case,  Q-;  -  .153,  as  before,  but  Q--  is  reduced  to  .063.  Further 
increasing  the  spread  of  the  distributuion  will  have  the  effect  of  further 
reducing  Q'j  (for  correspondingly  lengthened  stimulus  sequences )  while 
keeping  Q--  constant.  Thus,  the  greater  the  spread  of  the  T  distribution, 
the  more  readily  can  such  "divergent"  time  sequences  be  distinguished. 

Conversely,  two  sequences  which  are  identical  save  for  a  momentary 
divergence  in  recent  time  (say  at  '  -  f  )  can  be  distinguished  most  readily 
by  a  perceptron  with  T--  concentrated  at  small  values,  and  increasing 
the  spread  of  the  T  distribution  will  only  increase  the  difficulty  of 
discrimination . 
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It  should  be  emphasized  that  the  set  of  active  A -units  depends 
on  the  order  and  not  merely  on  the  constituents  of  a  stimulus  sequence.  Thus 
the  sequence  (  5^  >  i  5j  }  will  generally  activate  a  different  set  of  A-units 
from  the  sequence  (  Sj  ,  5^  ,  5:j  )  in  which  the  first  two  members  have  been 

inverted.  In  principle,  a  perceptron  of  this  type  which  receives  sequences  of 
sound  spectra  from  a  set  of  audio-filters  (instead  of  visual  patterns)  should  be 
capable  of  distinguishing  spoken  words,  or  other  characteristic  sound  sequences, 
such  as  progressions  of  chords  or  melodic  fragments, 

11,3  Models  with  Normal  Distribution  of 


A  somewhat  more  "natural”  model  than  the  discrete  spectrum 
models  considered  above  is  one  where  the  transmission  time  of  each  connection 
is  an  independent  random  variable  drawn  from  a  normal  distribution,  with 
parameters  and  crff)  .  If  an  A-unit  is  to  have  a  non-zero  proba¬ 

bility  of  being  active  at  time  t  in  such  a  model,  the  dynamics  must  be 
modified  by  the  introduction  of  an  "integration  period".  At  ,  such  that 


t 

Y^E(r)-T{T) 

r^t-At 


(11.3) 


summing  over  all  values  of  T  for  which  E  or  I  ^he  numbers  of  excitatory 
or  inhibitory  impulses  arriving  at  the  A-unit)  are  non-zero. 


The  qualitative  properties  of  such  a  system  are  clear  without 
further  analysis.  If  At  is  short  compared  to  criT)  ,  the  presentation  of 

I  I  I  I 

a  momentary  or  transient  stimulus  will  lead  to  a  gradual  increase  in  the 
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proportion  of  responding  A-units  (or  the  value  of  Q-  )  followed  by  a  gradual 
decrease.  If  is  greater  than  cr(rJ  ,  the  system  will  respond  with  a 

momentary  burst  of  activity,  maintained  for  a  period  equal  to  At  ,  and 
will  then  immediately  relapse  to  inactivity.  We  are  chiefly  concerned  with 
the  case  where  At  is  less  than  rr(r)  .  In  this  case,  the  performance  of 
the  system  in  discriminating  sequences  will  be  close  to  that  of  the  Poisson 
or  binomial  models,  with  an  appropriate  discrete  spectrum  of  T^j  ,  to 
approximate  the  normal  distribution.  There  will  be  a  maximum  sensitivity 
to  differences  between  the  two  sequences  and  jJj  occurring  at 

time  t  -  /i(T)  ,  with  less  sensitivity  to  more  recent  or  more  remote 
differences  between  the  sequences. 
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12.  PERCEPTRONS  WITH  MULTIPLE  R -UNITS 


Up  to  now,  the  simple  "three-layer"  topology  (S-A-R)  with  a 
single  R-unit  has  been  the  only  one  considered.  In  this  chapter,  we  will 
still  consider  only  three-layer  perceptrons,  but  more  than  one  R-unit  will 
be  permitted.  The  performance  of  such  systems,  it  will  be  seen,  does  not 
differ  significantly  from  that  of  perceptrons  which  have  been  considered  in 
previous  chapters,  except  for  the  fact  that  it  is  now  possible  to  form  classi¬ 
fications  with  more  than  two  classes,  with  simple  R-units,  or  to  have 
perceptrons  respond  simultaneously  to  several  different  attributes  of  a 
stimulus  pattern.  The  most  interesting  analytic  problems  for  such  systems 
are  concerned  with  the  optimum  coding  of  the  classes  of  patterns  to  be 
recognized,  in  order  to  optimize  performance, 

12,1  Performance  Analysis  for  Multiple  R-unit  Perceptrons 

Several  types  of  topological  organization  which  are  possible  for 
networks  with  more  than  one  R-unit  are  illustrated  in  Figure  35.  The  set  of 
A-units  which  are  connected  to  a  given  R-unit  will  be  called  the  source-set 
of  that  R-unit,  The  organization  which  is  most  economical  in  the  number  of 
A-units  employed  is  that  shown  in  Fig.  35(a),  where  every  A-unit  is  connected 
to  every  R-unit.  This  is  logically  equivalent  to  the  disjoint  source-set  model 
shown  in  Fig.  35(b),  if  every  source  set  is  required  to  have  the  same  compo¬ 
sition  of  origin  point  configurations  for  its  A-units.  Unless  otherwise  specified, 
it  will  be  assumed  that  each  R-unit  receives  the  same  number  of  input 
connections;  however,  if  the  R-set  is  large,  and  the  terminus  of  each  connection 
from  an  A-unit  is  selected  at  random,  the  total  number  of  inputs  to  each  R-unit 
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(a)  EVERY  A-UNIT  CONNECTED  TO  A  R-UNITS.  (IN  FULLY  COUPLED  CASE,  &  =  N^) 


(b)  DISJOINT  SOURCE-SET  FOR  EACH  R-UNIT.  (SPECIAL  CASE  OF  (a)  WHERE  A  =  \) 


(c)  EACH  R-UNIT  HAS  SOURCE  SET  OF  N  RANDOMLY  SELECTED  A-UNITS.  (EgUIVALENT  TO  (a)  IF  N  ^  N^) 


Figure  35  TYPES  OF  TOPOLOGICAL  ORGANIZATION  FOR  PERCEPTRONS  WITH  MULTIPLE  R-UNITS 


(i.e.,  the  size  of  its  source  set)  will  be  a  binomially  distributed  random 
variable.  An  inversion  of  this  connection  procedure  is  shown  in  Fig.  3  5(c). 

In  this  case,  each  R-unit  receives  exactly  N  connections,  but  the  origins 
are  assigned  at  random  among  the  A-units.  Here  the  number  of  output 
connections  from  an  A -unit  will  be  a  Poisson  distributed  random  variable. 

It  can  be  readily  seen  that  as  // ^  becomes  large,  the  various 
topological  connection  schemes  illustrated  in  Fig.  3  5  all  become  logically 
equivalent  in  their  performance  characteristics,  since  it  does  not  matter  to 
the  performance  of  the  perceptron  whether  two  R-units  are  connected  to  the 
identical  A-unit  or  to  two  different  A-units  with  equivalent  origin  point 
configurations.  For  the  sake  of  specificity,  the  following  discussion  will 
assume  the  organization  illustrated  in  Fig.  3  5(b),  with  a  disjoint  source-set 
for  each  R-unit. 

In  S-controlled  discrimination  experiments,  it  is  obvious  that 
performance  of  such  a  system  in  equivalent  to  that  of  /V^  simple  perceptrons 
(where  is  the  number  of  R-units)  each  of  which  is  exposed  to  the' same 

training  sequence,  but  trained  on  its  own  independent  dichotomy  of  the  environ¬ 
ment.  For  example,  if  =  Z,  one  R-unit  might  be  trained  to  discriminate 

between  stimuli  in  the  upper  and  lower  halves  of  the  field,  while  the  second 
R-unit  is  taught  to  discriminate  between  right  and  left  halves.  The  proba¬ 
bility  that  both  responses  are  correct,  at  the  end  of  the  training  sequence, 
will  be  the  product  of  the  probability  that  is  correct  on  its  dichotomy, 

and  the  probability  that  P  is  correct  on  its  dichotomy.  In  the  present  case, 
assuming  that  stimuli  occur  with  equal  frequency  in  all  parts  of  the  field,  we 
would  expect  the  two  dichotomies  to  be  equally  difficult,  so  that  the  probabi¬ 
lity  of  correct  performance  on  the  joint  response  would  be  the  square  of  the 
probability  of  correct  response  for  either  dichotomy  considered  separately. 
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In  an  error  correction  procedure,  a  more  interesting  problem 
arises.  Clearly,  if  each  R-unit  and  its  set  of  input  connections  are  corrected 
on  an  assigned  binary  classification  or  response  function  independently  of  the 
other  R-units,  the  same  situation  exists  as  in  S-controlled  experiments,  and 
the  probability  of  correct  response  on  the  entire  set  of  N p  R-units  after  a 
given  training  sequence  will  be  the  product  of  the  probabilities  for  each  of  the 
response  functions  considered  separately.  More  generally,  if  we  let 
P(  (  (.y'  .  Vj-  '  =  probability  of  correct  response  on  test  stimulus  5  / 

for  the  1  response  function,  given  a  source-set  with  A/^-  members 
connected  to  the  R-unit,  we  have 


P, 


(12.1) 


for  the  probability  that  the  joint  response  to  ....  is  correct  on  all  R-units. 


Suppose,  however,  the  reinforcement  control  system  is  only 
capable  of  recognizing  that  the  total  response  (on  all  R-units  jointly)  is  right 
or  wrong,  and  cannot  tell  which  individual  R-units  are  contributing  to  the 
error.  In  this  case,  it  might  be  supposed  that  the  system  would  eventually 
learn  the  correct  joint  response  by  assuming  that  every  R-unit  is  wrong 
whenever  an  error  in  the  composite  response  occurs,  and  correcting  the 
perceptron  accordingly.  This  supposition,  unfortunately,  is  not  true, 
as  proven  by  the  following  theorem. 

THEOREM:  Given  a  perceptron  with  more  than  one  R-unit,  and  a 

response  function  k’(W}  or  a  classification  CiW)  for  which 
a  solution  exists,  it  may  be  impossible  to  achieve 
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this  solution  by  an  error  correction  procedure  which 
applies  negative  reinforcement  jointly  to  all  R-units 
based  on  errors  in  the  joint  response. 


PROOF:  The  theorem  can  be  proven  by  a  simple  example.  Consider  the 

perceptron  illustrated  below,  which  has  two  sensory  units,  two  A -units, 
and  two  R-units.  (The  topology  corresponds,  in  this  case,  to  Fig.  3  5(a). 


S 

-  / 


o- 

a 


I 


Assume  all  >  initially  =  +  1.  Let  l-V  consist  of  two  stimuli:  5^ 
illuminates  sensory  point  alone,  and  5..  illuminates  alone.  Let 
the  required  Joint  classification  function  be: 

•  r  . 

r  .  r  ■  ( -I  -  I  I  for 

’  ■  'I 

( I'l,  r' }  -(  -  I ,  t  I)  for  5^ 

A  solution  clearly  exists,  e.g.,  by  making  v and  positive,  and  7/"^, 

and  /  -  ^  negative.  Since  all  7/'^-^,  are  initially  positive,  whichever 
stimulus  occurs  first  (say  )  will  elicit  a  positive  output  from  both  R-units, 
which  is  wrong.  The  error  correction  procedure  would  then  apply  negative 
reinforcement  to  both  R-units,  having  the  effect  (if  5^  is  the  stimulus)  of 
making  both  connections  from  d ^  negative.  But  this  now  makes  both 


R-units  negative,  which  is  still  wrong.  Clearly,  the  error  cannot  be 
corrected  by  reinforcement  in  the  presence  of  5^  ,  since  the  signals  to 

both  R-units  are  coupled,  and  must  rise  or  fall  together.  If  the  second 
stimulus  should  occur,  the  situation  is  not  improved,  and  the  same  oscil¬ 
latory  behavior  will  continue,  with  the  perceptron  switching  from 
f  ,  r^)  =  ( -h  I  ,  +  f )  to  ("-/,-/)  alternately.  Thus  a  solution  will 
never  be  achieved,  which  proves  the  theorem. 


Note  that  if,  instead  of  administering  negative  reinforcement 
to  all  R-units  (which  assumes  that  each  one  is  currently  wrong)  the  error 
correction  procedure  were  to  be  modified  to  apply  a  correction  to  each 
response  unit  according  to  the  rule 


(12.2) 


where  value  of  /p  employed  in  reinforcement  of  the  R-  connections, 

* 

and  R'  and  r;  are  the  required  and  obtained  responses,  respectively, 
th 

for  i  R-unit,  we  then  have  the  same  conditions  as  in  the  case  of 

independent  correction  of  each  R-unit  (see  Definition  41,  Chapter  5).  Thus, 

-► 

if  we  let  /p  -  R'^  r  *  be  a  vector  of  components,  the  I  component 
being  given  by  (12.2),  the  system  will  always  converge  if  a  solution  exists. 
This  implies,  however,  that  the  r.c.s.  must  not  only  be  able  to  recognize 
the  existence  of  an  error  in  some  R-component,  but  must  be  able  to  deter¬ 
mine  the  magnitude  (or  at  least  the  sign)  of  the  error  for  each  R-unit 
independently,  and  control  an  appropriate  value  of  for  each  section 

of  the  network.  A  logically  similar  procedure,  which  also  yields  a 
solution,  is  to  allow  the  r.c.s.  to  scan  the  R-units  sequentially,  checking 


the  correctness  of  each  one  in  turn,  and  applying  a  correction  only  to  the 
R-unit  currently  being  examined  by  applying  negative  reinforcement  when 
it  is  wrong.  This  requires  a  longer  training  process,  but  requires  the  r.c.s. 
to  act  on  only  one  component  at  a  time,  just  as  in  a  simple  perceptron. 

12,2  Coding  and  Code -Optimization  in  Multiple  Response  Perceptrons 


A  perceptron  with  a  large  number  of  R-units  can  clearly  be 
used  to  identify  many  more  than  two  alternative  kinds  of  stimuli.  A  number 
of  possible  schemes  for  the  representation  of  information  in  such  systems 
have  been  suggested.  As  a  first  possibility,  each  response  may  be  used  to 
identify  an  independent  trait,  or  property  of  the  stimulus,  such  as  left/right 
location,  size,  horizontal  or  vertical  elongation,  etc.  The  combination  of 
responses  occurring  when  a  test  stimulus  is  presented  should  then  serve  as 
a  description  of  the  stimulus  in  terms  of  its  traits.  An  alternative  scheme 
is  to  assign  a  distinct  response  unit  to  each  kind  of  stimulus,  and  train  the 
perceptron  to  emit  a  +1  response  only  if  that  type  of  stimulus  is  present. 

In  this  case,  only  one  R-unit  at  a  time  would  be  active,  the  active  unit 
identifying  the  stimulus  class.  Unlike  the  first  scheme,  where  some  response 
must  be  made  for  every  binary  trait  whether  applicable  or  not,  the  second 
scheme  has  the  possibility  of  rejecting  a  stimulus  altogether  as  "unknown”, 
in  which  case  all  R-unit  outputs  would  be  negative.  On  the  other  hand,  the 
second  scheme  lacks  the  economy  of  which  the  first  is  capable,  and  requires 
that  every  combination  of  traits  which  is  to  be  distinguished  must  be  assigned 
a  special  category  and  taught  to  the  perceptron  before  it  can  be  recognized. 

In  the  "trait  discrimination"  approach,  a  new  configuration  may  still  be 
correctly  described,  in  terms  of  the  characteristics  present,  even  though  it 
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has  not  been  seen  before.  (This  last  feature  is  only  weakly  present  in 
the  perceptrons  considered  thus  far,  since  it  depends  strongly  on  generali¬ 
zation.  Some  of  the  perceptrons  to  be  considered  in  later  chapters,  which 
generalize  more  effectively,  can  make  optimum  use  of  "descriptive  codes".) 

The  above  examples  illustrate  two  types  of  response-codes,  which 
will  be  called  configuiation  codes  and  position  codes,  respectively,  A 
configuration  code  employs  the  R-units  independently  of  one  another,  assigning 
an  arbitrary  dichotomy  to  each.  This  results  in  the  assignment  of  a  binary 
number  (if  the  R-units  are  two-state  devices)  to  each  stimulus.  The  total  num¬ 
ber  of  stimulus  types  which  can  be  encoded  in  this  fashion,  for  a  perceptron 
with  R-units,  is  2  .  A  position  code,  on  the  other  hand,  permits 

only  one  R-unit  to  be  "on"  (or  in  the  positive  state)  for  any  one  stimulus;  the 
code  takes  the  form  of  a  binary  number  of  bits  all  but  one  of  which  are 

zeros.  The  position  of  the  non-zero  bit  indicates  the  class  of  the  stimulus 
identified.  With  this  system,  only  types  of  stimuli  can  be  recognized. 

The  position  code  can  be  considered  a  special  case  of  a  configuration  code  in 
which  the  positive  classes  of  all  dichotomies  are  disjoint,  and  the  negative 
classes  are  almost  completely  intersecting.  A  compromise  between  the  two 
approaches  (which  permits  a  descriptive  statement  to  be  obtained  about  a 
stimulus  without  forcing  a  decision  on  inapplicable  characteristics)  would 
assign  n  response  units  to  each  set  of  n  mutually  exclusive  traits  (for 
example,  2  R-units  would  be  assigned  to  left/right  description,  3  to  hori¬ 
zontal,  vertical,  or  diagonal  specification,  etc.).  Each  R-unit  would  then 
be  made  to  discriminate  between  "trait  present"  and  "trait  absent", 
permitting  any  combination  to  occur.  Such  a  system  will  be  classed  under 
configuration  codes. 
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The  problem  of  finding  an  optimum  code  for  a  particular  task  can 
be  specified  for  a  given  value  of  ,  an  environment,  W  ,  and  a  classifi¬ 

cation,  (f  i  I'.'V  ,  into  N  types  of  stimuli.  Clearly,  if  N  is  greater  than  , 
a  configuration  code  must  be  used,  or  the  problem  is  insoluble.  If  N  is 
commensurate  with  A/^,  ,  however,  we  have  a  choice  of  either  assigning 
a  position  code,  in  which  each  R-unit  identifies  the  presence  or  absence  of 
a  single  type  of  stimulus,  or  assigning  a  configuration  code,  in  which  each 
R-unit  is  assigned  an  arbitrary  dichotomy.  In  general,  the  problem  is  to 
find  the  optimum  set  of  dichotomies  to  be  assigned  to  the  R-units,  so  as  to 
obtain  the  greatest  probability  of  correct  identification  for  an  arbitrarily 
selected  test  stimulus.  Let  us  assume  all  stimuli  equally  likely  to  occur, 
and  all  classes  of  equal  size  (i.e.,  an  equal  number  of  stimuli  in  each).  The 
number  of  A-units  connected  to  each  R-unit  is  also  assumed  to  be  constant. 

Let  the  vector  •  ■ '  -  f  /•  .  ,  . . . ,  /•, .  ,,  =  the  correct  response 

vector  for  a  given  test  stimulus.  Then,  from  equation  (IZ.l)  we  are 
required  to  maximize 

Since  we  further  assume  that  is  chosen  arbitrarily,  and  that  every 

stimulus  is  equally  likely  to  be  chosen  as  a  stimulus,  we  require  the 
expected  value 


to  be  maximal.  The  choice  of  dichotomies  whicli  maximizes  (12.3)  would  be 
considered  an  optimum  code  for  the  environment  and  perceptron  in  question. 
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At  present,  no  general  solution  to  this  problem  has  been  found.  Several 
heuristic  cues  as  to  the  organization  of  optimal  codes  are  worth  noting, 
however . 

(1)  If  a  given  stimulus  class  has  members  which  are 
disjoint  from  the  stimuli  of  all  other  classes,  while  the  remaining  classes 
have  large  retinal  intersections,  it  will  clearly  be  advantageous  to  employ 
a  single  R-unit  for  the  recognition  of  the  stimulus  class  in  question,  with  a 
highly  assymmetric  dichotomy  which  does  not  attempt  to  divide 

the  remaining  stimuli  into  two  sub-sets,  but  takes  advantage  of  the 
"natural"  dichotomy  formed  on  the  basis  of  location. 

(2)  If  the  relationships  of  all  stimulus  classes  are  symmetric, 
so  that  no  two  classes  tend  to  "stick  together"  more  than  any  other  two 
classes,  and  no  pair  of  classes  are  easier  to  discriminate  than  any  others, 
and  if  S-controlled  reinforcement  is  to  be  used,  it  will  probably  be  best  to 
use  equal  dichotomies  for  all  R-units,  {  stimuli  in  each  positive  set)  so 
as  to  avoid  asymmetric  generalizations  from  the  larger  set  to  the  smaller 
one.  The  results  of  the  frequency  bias  experiments,  illustrated  in  Figs.  16 
and  25,  appear  to  support  this  conjecture.  Where  an  error  correction 
method  is  used,  however,  empirical  results  suggest  that  asymmetric 
dichotomies  are  preferable. 

(3)  There  exist  classifications  which  cannot  be  achieved  by 
means  of  a  position  code,  which  can  be  achieved  with  a  configuration  code. 

For  example,  consider  the  following  case:  Let  there  be  three  stimuli  in 
W  ,  such  that  activates  activates  a 2  and  5j,  activates 
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both  O I  and  a  ,  .  Let  there  be  three  simple  R-units,  each  connected  to 
both  and  ,  It  is  required  to  assign  a  unique  code  number  to  each 
of  the  three  stimuli.  With  a  position  code,  the  R-unit  assigned  to  identify 
must  give  a  positive  response  when  both  o.^  and  are  active,  but  a 
negative  response  when  either  or  /  alone  is  active.  This  is  clearly 
impossible,  with  simple  R-units.  However,  if  a  configuration  code  is 

/  t-  ^ 

employed,  we  can  assign  the  R-function  ,  r  ,  y 

(+1,  -1,  -1)  for 
(-1  ,  +1  ,  -1)  for 
(  f- 1 ,  +  1  ,  -  1 )  f  o  r  5j 

which  is  readily  soluble,  by  an  error  correction  procedure.  is 

obviously  redundant  here,  and  is  arbitrarily  set  to  -1  for  all  stimuli. 

(4)  A  general  rule,  proposed  by  Joseph,  is  the  following: 
The  smallest  possible  number  of  R-units  should  be  required  to  distinguish 
between  very  similar  stimuli.  The  more  dissimilar  two  stimuli  are,  the 
more  R-units  may  be  allowed  to  place  the  two  in  opposite  classes. 

Note  that  in  this  example,  it  is  possible  to  assign  an  arbitrary  classi¬ 
fication  to  an  environment  of  3  stimuli  with  only  2  A-units.  This  could  not 
be  done  with  a  simple  perceptron  (as  proven  in  Corollary  2  of  Theorem  3, 
Chapter  5).  The  addition  of  a  second  R-unit  in  this  model  substitutes  for 
the  missing  A-unit  which  would  otherwise  be  required. 
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In  empirical  tests  with  the  Mark  I  perceptron  (such  as  the 
experiments  described  in  the  following  section)  it  has  been  found  that  the 
choice  of  a  code,  even  with  binary  numbers  of  a  fixed  length,  can  easily 
determine  whether  or  not  a  particular  task  is  within  the  perceptron's 
capability. 

12.3  Experiments  with  Multiple  Response  Systems 


The  Mark  I  perceptron  at  C.A.L.  is  equipped  with  eight  binary 
R-units,  and  512  A-units,  which  can  be  employed  in  any  combination.  The 
network  topology  is  of  the  type  shown  in  Fig.  3  5(b).  A  number  of  experiments 
have  been  performed  (Ref.  30)  dealing  with  the  recognition  of  letters  of  the 
alphabet  and  sets  of  geometrical  patterns  where  multiple  classifications  are 
required.  Two  such  experiments  are  illustrated  in  Figures  36  and  37  „ 

In  Fig.  36,  learning  curves  are  shown  for  an  S-controlled 
experiment  on  the  left,  and  for  an  error -correction  experiment  on  the  right. 

In  each  case,  the  perceptron  was  taught  to  identify  eight  letters  of  the  alpha¬ 
bet,  presented  in  the  form  of  large  block  letters  in  random  locations,  over  a 
considerable  part  of  the  retinal  field.  In  the  error  correction  procedure, 
each  of  the  erroneous  R-units  is  correctad  simultaneously. 

Figure  37  shows  the  learning  curve  for  the  entire  alphabet, 
presented  in  fixed  position.  A  partially  optimized  binary  code  employing 
five  R-units  was  used  here.  This  represents  about  the  limit  of  the  capacity 
of  the  Mark  I  system.  Attempts  at  teaching  the  Mark  I  to  recognize  all 
26  letters  in  two  type  faces  simultaneously  have  been  unsuccessful,  the 
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Figure  36  LEARNING  CURVES  FOR  EIGHT  LETTER  IDENTIFICATION  TASK 


Figure  37  LEARNING  CURVE  FOR  26  LETTERS:  CORRECTIVE  TRAINING 


maximum  performance  being  about  85%  on  the  combined  alphabets,  With  a 
discrimination  task  of  this  difficulty,  any  displacement  of  the  patterns  from 
the  position  where  they  have  been  learned  is  likely  to  abolish  the  correct 
response . 


On  easier  problems,  such  as  a  four-letter  discrimination  task, 
the  choice  of  code  is  found  to  make  little  difference  in  system  performance. 

The  code  becomes  critical  only  when  the  discrimination  capability  is  marginal, 
as  in  the  26  letter  identification  task.  Given  the  choice  between  a  position 
code  and  a  configuration  code  with  the  number  of  A-units  in  a  source-set  held 
constant,  the  position  code  generally  seems  preferable  with  the  kinds  of 
stimulus  material  employed  in  these  experiments.  If  the  same  total  number 
of  A-units  must  be  divided  among  the  source  sets  of  the  additional  R-units 
used  for  the  position  code,  however,  better  performance  is  obtained  with 
the  more  economical  configuration  code,  which  uses  binary  numbers  for 
identification,  with  larger  source  sets. 
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13.  THREE-LAYER  SYSTEMS  WITH  VARIABLE  S-A  CONNECTIONS 


In  the  foregoing  chapters,  we  have  almost  exhausted  the 
possible  ramifications  of  minimal  three -layer  perceptrons,  having  an 
S  »-A-i-R  topology.  Only  one  constraint  remains  to  be  dropped,  in  order 
to  obtain  the  most  general  system  of  this  class:  this  is  the  requirement  that 
S  to  A -unit  connections  must  have  fixed  values,  only  the  A  to  R  connections 
being  time -dependent .  In  this  chapter,  variable  S-A  connections  will  be 
introduced,  and  the  application  of  an  error -cor rection  procedure  to  these 
connections  will  be  analyzed.  It  would  seem  that  considerable  improvement 
in  performance  might  be  obtained  if  the  values  of  the  S  to  A  connections 
could  somehow  be  optimized  by  a  learning  process,  rather  than  accepting 
the  arbitrary  or  pre-designed  network  with  which  the  perceptron  starts  out. 
It  will  be  seen  that  this  is  indeed  the  case,  provided  certain  pitfalls  in  the 
design  of  a  reinlorcenaent  procedure  are  avoided. 

13.1  Assigned  Error,  and  the  Local  Information  Rule 


In  order  to  apply  an  error  correction  procedure  to  all  connections 
of  a  perceptron,  including  the  S-A  connections,  we  must  first  re-examine 
the  concept  of  "error"  which  has  been  employed  so  far  as  a  criterion  for 
reinforcement.  In  the  theorem  of  Section  IZ.  1,  it  was  shown  that  it  will 
not  do  to  assume  that  all  units  of  the  perceptron  are  equally  in  error  when 
a  mistake  in  the  total  response  occurs.  It  was  seen  that  if  all  connections 
are  corrected,  on  the  assumption  that  both  R-units  are  wrong  (in  the  two 
R-unit  case  employed  for  demonstration)  a  solution  may  never  be  achieved. 
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The  alternative  was  to  assign  an  error  independently  to  each  R-unit,  by  a 
suitable  criterion,  and  correct  the  connections  leading  to  each  R-unit  in 
accordance  with  the  corresponding  error  indication.  In  the  present  case, 
where  A-units  as  well  as  R-units  are  to  have  their  input-connections  modified, 
it  becomes  necessary  to  assign  an  error  indication  to  each  A-unit,  as  well 
as  to  each  R-unit. 

In  preceding  chapters,  the  assigned  error  for  an  R-unit,  , 

was  taken  to  be  equal  to  (/?*  -  r*)  »  where  /P  *  is  the  desired  response,  and 
r^is  the  obtained  response.  A  positive  error  meant  that  the  R-unit  was  to 
be  turned  to  its  positive  state,  and  a  negative  error  meant  that  it  was  to  be 
turned  to  its  negative  state,  in  the  case  of  simple  R-units.  Similarly,  for  an 
A-unit  a;  ,  we  might  use  a  positive  assigned  error,  E-  ,  to  indicate  that  the 
unit  is  to  be  turned  "on",  and  a  negative  E^-  to  indicate  that  it  is  to  be  turned 
"off",  or  made  inactive,  in  response  to  the  current  stimulus.  The  difficulty  is 
that  whereas  /?  * ,  the  desired  response,  is  postulated  at  the  outset,  the  desired 
state  of  the  A-unit  is  unknown.  We  can  only  say  that  we  desire  the  A-unit  to 
assume  some  state  in  which  its  activity  will  aid,  rather  than  hinder,  the 
perceptron  in  learning  the  assigned  classification  or  response  function. 

One  possible  way  of  obtaining  the  required  activity  states  of  the 
A-units  would  be  to  examine  each  possible  state  of  the  system,  with  its 
corresponding  G-matrix,  and  determine  whether  or  not  a  solution  to  the 
assigned  problem  exists.  If  a  state  is  found  in  which  a  solution  does  exist, 
then  the  appropriate  responses  can  be  taught  to  each  A-unit,  by  means  of  a 
standard  error-correction  procedure,  operating  on  the  A-units  in  the  same 
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manner  as  on  the  R-units.  Such  an  approach,  however,  evades  the  real 
issue  of  finding  a  procedure  which  will  guarantee  convergence  to  a  solution 
without  requiring  that  the  reinforcement  control  system  know  the  solution 
state  ahead  of  time.  Specifically,  in  assigning  an  error-indication  to  an 
A -unit,  we  wish  to  base  the  assignment  only  on  the  state  of  the  network  at 
the  time  and  locality  where  the  error  occurs.  The  following  rule  will 
therefore  be  accepted  as  a  working  premise  for  all  models  to  be  considered: 

LOCAL  INFORMATION  RULE:  For  any  A-unit,  a  [  ,  the  assignment  of  an 

error  t  ■(''-)  can  depend  only  on  information  concerning  the 
activity  or  signals  received  by  a;  ,  the  value  of  its  output 
connections,  and  the  error  assignment  at  their  terminal  points 
at  time  f  . 

In  other  words,  only  /  itself  and  the  points  to  which  it  is  directly 
connected  can  determine  the  error  assignment. 

13.2  Necessity  of  Non -deterministic  Correction  Procedures 

By  a  "deterministic  reinforcement  procedure"  we  mean  that  if 
the  same  state  of  the  system  should  occur  repeatedly  with  all  signals  and 
values  unchanged,  an  identical  reinforcement  will  be  applied;  and  that  if 
two  similar  subnetworks  are  in  the  same  state  of  activity,  value,  and  error 
assignment,  they  will  be  modified  identically.  Up  to  this  point,  no  problem 
has  been  found  for  which  a  solution  exists,  where  a  suitably  defined 
deterministic  reinforcement  procedure  could  not  find  a  solution.  The  first 
exception  to  this  is  stated  in  the  following  theorem. 


-289- 


THEOREM  1: 


Given  a  three-layer  series -coupled  perceptron  with 
simple  A  and  R-units,  and  variable -valued  S-A  connections, 
and  a  classification  C(W)  for  which  a  solution  exists, 
it  may  be  impossible  to  achieve  a  solution  by  any  deterrhi- 
nistic  correction  procedure  which  obeys  the  local  inform¬ 
ation  rule. 

PROOF:  The  proqf  is  by  example.  Consider  the  following  network: 

a, 

^2 

Let  and  have  thresholds  of  1,  and  let  the  stimuli  of  W  consists  of 
2j.  ^  alone  (stimulus  5,  )  or  A2  alone  (stimulus  5^  )•  Let  the  required 
clas  sification  be  (  ^ j  ,  ^ 2  ^  ^  ^  ~  ^  ^  0  -^2 

A  solution  clearly  exists;  for  example,  the  following  assignment  of  values 
would  be  satisfactory: 
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In  this  problem  ,  a  solution  clearly  requires  an  asymmetric  assignment  of 
values  for  "parallel"  and  "crossed"  connections  from  each  sensory  unit  and 
from  each  A -unit.  If  we  assume  that  all  values  are  initially  equal,  then 
either  and  o.  ^  are  both  on,  or  else  both  are  off.  In  either  case,  one 

of  the  R-units  is  wrong,  and  whichever  one  is  wrong  will  induce  a  symmetric 
correction  of  the  values  from  both  A-units.  Moreover,  since  both  and  Q. ^ 
are  in  indistinguishable  states  (whichever  R-unit  happens  to  be  wrong)  under 
the  local  information  rule  both  units  must  receive  an  identical  error  indication. 
But  then  the  connections  from  whichever  S-unit  is  active  will  both  be  modified 
identically,  and  the  result  is  that  the  members  of  each  value -pair  (from  each 
S-unit  and  from  each  A-unit)  are  still  identical.  The  required  asymmetry 
between  "parallel"  and  "crossed"  connections  can  therefore  never  arise,  and 
the  same  response  must  always  occur  for  and  5-,  •  Q.E.D, 

While  this  theorem  shows  that  a  deterministic  procedure  cannot 
be  guaranteed  to  work,  it  remains  to  be  shown  that  a  non-deterministic 
procedure  will  work.  In  the  most  extreme  case,  we  could  employ  a  procedure 
which  randomly  varies  the  value  of  every  connection,  independently  of  the  others, 
as  long  as  errors  continue  to  occur.  In  this  case,  if  the  phase  space  of  the 
system  is  bounded,  a  solution  will  certainly  occur  in  finite  time,  but  we  have 
already  seen  the  devastating  consequences  of  a  much  less  drastic  randomization 
of  the  reinforcement  process  on  learning  time  (c.f..  Figure  19).  In  the 
following  section,  a  more  systematically  directed  procedure  is  presented, 
which  can  be  shown  to  lead  to  a  solution  with  probability  1,  as  in  the  case 
of  error  correction  procedures  considered  for  elementary  perceptrons. 
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13,3  Back-Propagating  Error  Correction  Procedures 


The  procedure  to  be  described  here  is  called  the  "back- 
propagating  error  correction  procedure"  since  it  takes  its  cue  from  the 
error  of  the  R-units,  propagating  corrections  back  towards  the  sensory- 
end  of  the  network  if  it  fails  to  make  a  satisfactory  correction  quickly  at 
the  response  end.  The  actual  correction  procedure  for  the  connections  to 
a  given  unit,  regardless  of  whether  it  is  an  A-unit  or  an  R-unit,  is  perfectly 
identical  to  the  correction  procedure  employed  for  an  elementary  perceptron, 
based  on  the  error-indication  assigned  to  the  terminal  unit.  Thus,  if  the 
error  E-  is  positive,  a  correction  is  applied  to  the  values  of  the  active 
connections  terminating  on  which  would  tend  to  increase  the  signal  to  a^- 
algebraically,  eventually  turning  it  "on";  if  E-^  is  negative,  a  correction, 

,  of  the  opposite  sign  is  applied  to  all  active  connections  terminating  on 
a/  .  The  essential  feature  of  the  method  is  a  probabilistic  procedure  for 
assigning  the  errors,  E-  . 

The  rules  for  the  back-propagating  correction  procedure  are 

as  follows : 

^  0-  H-  ^ 

1.  For  each  R-unit,  set  E^  -  P  -  r  ,  where  P  = 
required  response  and  ^  obtained  response. 

2,  For  each  association  unit,  q.-  ,  E^  is  computed  as 

follows,  for  each  stimulus:  Begin  with  E-  ~  0. 

a  )  If  Cti  is  active,  and  the  connection  terminates 

on  an  R-unit  with  a  non-zero  error  E^  which 
differs  in  sign  from  ,  add  -1  to  with 

probability 


-292- 


b)  If  Qi  is  inactive,  and  the  connection 

terminates  on  an  R-unit  with  an  error  E ^  which 
agrees  in  sign  with  ,  add  +1  to  £•  with 

probability  . 

c)  If  a-  is  inactive,  and  the  connection  terminates 

on  an  R-unit  with  an  error  E^,  which  does  not  agree 
in  sign  with  (or  if  is  zero)  add  +1  to  f; 

with  probability  . 

For  all  other  conditions,  Ei  is  not  changed. 

3,  U  £■  ^  C  ,  add  f)  to  all  active  connections  terminating 

on  the  A  or  R-unit  up  ,  taking  the  sign  of  to  agree 
with  the  sign  of  E-  .  In  symbols, 

\J 

^  a;  sgn{Ej)£ 

where  6  is  the  magnitude  of  /p  . 

In  general,  and  p^  are  taken  large  relative  to  p^  .  The  effect  of  these 
rules  is  to  try  to  turn  off  any  A-units  (with  probability  p^  )  whose  output  is 
currently  contributing  to  an  error  in  an  R-unit,  and  to  try  to  turn  on  any 
A-units  (with  probability  p  ^  )  which  are  currently  off,  but  whose  out¬ 
put  signals  would  help  correct  an  error  in  one  or  more  R-units  if  they 
were  on.  The  purpose  of  the  third  probability,  p^  ,  is  twofold;  first, 
if  no  A-units  respond  to  a  stimulus,  and  all  of  the  values  have  the  wrong 
sign  or  are  zero  (as  in  typical  initial  conditions)  it  guarantees  that  some 
A-units  will  come  on;  second,  it  prevents  the  permanent  loss  of  A-units 
which  might  be  necessary  for  the  proper  response  to  some  stimulus, 
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even  though  their  values  may  have  the  wrong  sign  at  some  time  during  the 
training  procedure.  If  f)  and  p  are  larger  than  p  ,  the  main  changes 
in  the  network  will  clearly  all  tend  to  go  in  the  direction  of  a  solution.  The 
following  theorem  proves  that  the  procedure  is  sufficient  to  guarantee  a 
solution,  if  a  solution  exists,  in  the  form  of  some  assignment  of  values  to  the 
network. 

THEOREM  2:  Given  a  three-layer  series -coupled  perceptron,  with 

simple  A  and  R-units, variable -valued  S-A  connections, 
bounded  A-R  values,  and  a  classification  for  which 

a  solution  exists,  then  a  solution  to  C(^)  can  be  obtained 
in  finite  time  with  probability  1  by  means  of  a  back- 
propagating  error -correction  procedure,  given  that  each 
stimulus  in  .1  always  reoccurs  in  finite  time,  and  that 
probabilities  /  ,  ,  p  ,  and  are  all  greater  than  0 
and  less  than  1  . 

PROOF:  The  state  of  the  S-A  network  can  be  characterized,  for  present 

purposes,  by  an  by  '■  matrix,  -1  ,  which  consists  of  the  A/,  row  vectors; 

r  A  t  * 

‘  ,  )  O  ■  .  .....  , 

where  a;i  -  I  0  -  signal  generated  by  unit  a,-  in  response  to 

stimulus  i  •  .  Two  assignments  of  values  to  S-A  connections  which  yield 

the  same  A  -matrix  will  be  called  equivalent  S-A  states.  To  each  such 

jjc 

matrix,  A  ,  there  corresponds  a  G-matrix  for  the  perceptron.  We  will  say 
that  a  given  S-A  state  permits  a  solution  if  the  corresponding  G-matrix  is 
one  for  which  a  solution  to  C  i'V)  exists. 
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First,  suppose  the  system  is  initially  in  a  state  which  permits 
a  solution.  Then  if  it  remains  in  this  state  sufficiently  long,  a  solution  must 
occur  with  probability  1,  due  to  Theorem  4,  of  Chapter  5.  Since  S-A 
connections  only  change  in  value  if  the  errors  F-  are  assigned  magnitudes 
other  than  zero,  and  since  the  probabilities  t  t  and  of  assign¬ 

ing  non-zero  F;  are  all  less  than  1,  there  is  a  probability  p  >0  that  the 
perceptron  will  remain  in  its  initial  state  for  any  given  finite  time.  Thus, 
there  is  a  probability  greater  than  zero  that  a  solution  will  be  achieved 
before  any  change  in  the  A  -matrix  occurs. 

>!< 

Next,  suppose  the  A  -matrix  changes  to  some  different  state 
before  a  solution  is  achieved,  or  suppose  that  the  system  starts  out  in  a 
state  which  does  not  permit  a  solution.  Then  it  is  sufficient  to  show  that 
the  system  will  always  return  to  a  state  which  does  permit  a  solution  in 
finite  time  with  probability  1,  and  that  the  probability  P  of  obtaining  a 
solution  for  a  given  S-A  state  does  not  approach  zero  with  successive 
returns  to  the  same  state.  If  it  does  always  return  to  such  a  state  ,  then 
each  time  it  arrives  at  such  a  state,  there  will  be  a  probability  greater 
than  zero  (and  bounded  away  from  zero)  that  it  finds  a  solution  before  the 
state  is  destroyed.  Thus,  with  sufficiently  many  returns  to  states  which 
permit  solutions,  a  solution  will  be  found  with  probability  1. 

It  is  now  necessary  to  show  that  from  an  arbitrary  starting 
state,  the  system  will  always  achieve  an  A  -matrix  which  permits  a 
solution  in  finite  time  with  probability  1  . 


If  the  current  A  -matrix  does  not  permit  a  solution,  then 
either  or  both  of  the  following  conditions  must  be  present: 

(a)  Some  which  should  be  1  for  a  solution  to  be 

possible  is  actually  0; 

(b)  For  some  L*j  which  should  be  0  and  is  actually  1, 

there  must  be  a  *^he  sign  of  which  disagrees  with 

R  for  stimulus  5y  . 

The  second  condition  follows  from  the  fact  that  if  every  active  connection 
from  A  to  R-units  has  a  '’irif  with  proper  sign  for  every  S-  ,  and  if 
condition  (a)  is  not  present,  then  a  solution  already  exists.  Now  suppose, 
for  an  arbitrary  A  -matrix,  Stimulus  S-  occurs.  Then  condition  (a)  may 
exist  for  some  A-units,  and  condition  (b)  for  others.  For  each  A-unit 
which  is  currently  off  (including  all  of  those  to  which  condition  (a)  applies) 
Rule  Zb  or  2c  of  the  correction  procedure  becomes  operative,  and  there  is 
some  probability  that  each  such  unit  will  receive  an  error  indication.  Since 
we  have  assumed  the  activity  of  these  units  to  be  necessary  for  a  solution, 
and  have  postulated  that  a  solution  exists,  there  must  be  some  assignment 
of  S-A  values  for  each  such  unit  which  will  turn  it  "on"  for  S‘  •  Since  5; 
is  postulated  to  reoccur  infinitely  many  times,  then  it  follows  from 
Theorem  4  of  Chapter  5  (treating  the  A-unit  and  its  input  connections  as 
equivalent  to  an  R-unit)  that  the  required  will  ultimately  be  obtained. 

Since  each  A-unit  is  corrected  independently  of  the  others,  a  state  will 
ultimately  occur  in  which  all  of  the  A-units  which  were  wrong  by  condition  (a) 
have  been  corrected.  Next  consider  those  A-units  for  which  condition  (b) 
applies.  For  these  units  Rule  2a  of  the  error  correction  procedure  is 
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applicable,  and  by  the  same  argument  as  above,  the  ,c'-  •  will  ultimately 
all  be  corrected.  But  in  that  case,  we  have  arrived  in  a  state  which  permits 
a  solution.  Since  there  is  nothing  in  the  above  argument  which  depends  on 
states  prior  to  the  arbitrary  starting  state,  the  system  can  arrive  at  states 
permitting  solutions  indefinitely  often,  and  a  solution  must  therefore  occur 
with  probability  1,  provided  the  probability  P  of  finding  a  solution  while  in 
such  a  state  does  not  approach  zero.  This  last  assumption,  though  plausible, 
still  remains  to  be  rigorously  proven  for  the  general  case. 

For  the  special  case  in  which  the  values  'ir-^  are  bounded,  the 
remaining  assumption  can  be  proven  without  difficulty.  In  the  proof  of 
Theorem  4,  in  Chapter  5,  it  was  shown  that  the  number  of  corrections 
necessary  to  find  a  solution  is  at  most  equal  to 

A1(4  f  6i7r/ 

a  ((  -  n 

where  M  and  fV  are  constants  depending  only  on  the  G-matrix  (and 
therefore  on  A  ),  and  A  is  the  length  of  the  vector  Hz  .  Thus  the 
number  of  corrections  required  to  find  a  solution  can  incrase  only  as  a 
result  of  an  increase  in  the  magnitude  of  some  components  of  the  starting 
vector,  it;  ^  ,  upon  successive  returns  to  the  same  S-A  state.  But  if  all 
values  are  bounded,  the  components  of  t  “  are  also  bounded.  Conse- 

quently,  «  has  an  upper  bound  for  any  given  //  (or  for  any  given  A  ). 

This  means  that  there  is  a  maximum  number  of  corrections  that  might 
possibly  be  required  (assuming  that  a  solution  exists)  and  that  the  proba- 
bility  p  of  arriving  at  a  solution  before  destruction  of  the  A  state  is  not 
only  greater  than  zero  but  must  be  bounded  away  from  zero.  Q.E.D. 
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13.4  Simulation  Experiments 


At  the  present  time,  no  quantitative  theory  of  the  performance 
of  systems  with  variable  S-A  connections  is  available.  A  number  of  simu¬ 
lation  experiments  have  been  carried  out  by  Kesler,  however,  which 
illustrate  the  performance  of  such  systems  in  several  typical  cases,  shown 
in  the  accompanying  figures.  In  order  to  show  the  performance  of  the 
variable  S-A  system  to  its  best  advantage,  small  perceptrons  were  used,  for 
which  the  learning  of  a  horizontal/vertical  bar  discrimination  (Experiment  6) 
falls  short  of  what  might  be  obtained  with  an  optimum  S-A  organization. 

Figure  38  illustrates  the  effect  of  various  combinations  of  the 
probabilities  /  ^  ,  j  ,  and  /  (including  the  0,0,0  case  where  all  S-A 
connections  remain  fixed,  for  comparison).  The  curves  show  the  mean 
performance  for  ZO  perceptrons,  with  50  A-units,  having  10  input  connections 
to  each.  The  initial  values  of  all  S-A  connections  are  set  equal  to  +10,  and 
the  threshold  is  50.  The  same  set  of  20  networks  and  training  sequences 
was  used  for  each  probability  combination. 

It  is  found  that  if  the  probabilities  of  changing  the  S-A 

connections  are  large,  and  the  threshold  is  sufficiently  small,  the  system 

becomes  unstable,  and  the  rate  of  learning  is  hindered  rather  than  helped 

by  the  variable  S-A  network.  Under  such  conditions,  the  S-A  connections 

are  apt  to  change  into  some  new  configuration  while  the  system  is  still 

trying  to  adjust  its  values  to  a  solution  which  might  be  perfectly  possible 

with  the  old  configuration.  Better  performance  is  obtained  if  the  rate  of 

change  in  the  S-A  network  is  sufficiently  small  to  permit  an  attempt  at 

solving  the  problem  before  drastic  changes  occur.  To  improve  the  stability 

The  experiments  were  carried  out  with  the  Burroughs  220  computer  at 
Cornell  University,  and  the  IBM  704  at  the  A.E.C.  Applied  Mathe¬ 
matics  Center  at  New  York  University. 
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of  the  networkjin  all  experiments  shown  here,  the  A-R  connections  are 
reinforced, for  each  stimulus,  before  determining  whether  a  correction  should 
be  propagated  back  to  the  S-A  network.  Thus,  S-A  connections  are  changed 
only  if  the  system  fails  to  correct  an  error  at  the  A-R  level, 

In  Figure  39,  mean  performances  of  a  number  of  20  A-unit 
perceptrons  are  shown,  in  one  case  with  4  connections,  and  in  a  second 
case  with  50  connections  to  each  A-unit.  These  perceptrons  are  small  enough 
so  that  in  many  cases  we  would  expect  no  solution  to  exist  to  the  horizontal/ 
vertical  bar  problem  (which  requires  the  classification  of  40  stimuli  with 
only  20  A-units)  were  it  not  for  the  modifiable  S-A  network.  Initial  values 
of  S-A  connections  are  again  equal  to  10,  and  thresholds  are  2  m  ,  where 
m  -  number  of  connections  to  each  A-unit.  Note  that  with  50  fixed  connections 
to  each  A-unit  the  performance  is  poorer  than  with  only  4  connections,  but  that 
with  Pf-  .9 ,  p2  "  ■  ■’  and  ^  »  the  performance  overtakes  the  4-connection 

model.  This  is  because  with  laxge  numbers  of  S  -A  connections,  the  per- 
ceptron  can  effectively  take  its  pick  of  whatever  organization  might  be  most 
helpful,  and  can  always  reduce  excess  connections  to  zero  value,  while 
with  only  a  small  number  of  connections  at  its  disposal  it  is  seriously  limited 
in  its  potentialities.  With  only  4  connections,  variable  S-A  connections  have 
little  effect  on  performance. 

These  experiments  suggest  that  the  best  performance  will 
generally  be  obtained  by  taking  ^  ^ 2  ^  ^3  ' 
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Figure  38  BACK-PROPAGATING  ERROR-CORRECTION  EXPERIMENTS:  MEANS  OF  20  PERCEPTRONS 
Na  =50,  Q  ^  50,  10  CONNECTIONS  TO  EACH  A-UNIT.  HORIZONTAL/ VERTI CAL 
BAR  DISCRIMINATION  (EXPT.  6).  {O-SIGNALS  COUNTED  CORRECT  WITH  .5 
PROBABILITY). 
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An  interesting  application  of  the  variable  S-A  system  is  in 
pre-conditioning  a  perceptron  for  stimuli  of  a  particular  type  (such  as  line 
figures,  or  blob  patterns)  by  giving  it  a  number  of  discrimination  tasks  to 
perform  on  typical  material  of  the  given  type,  and  then  trying  to  teach  it  a 
new  discrimination  on  the  same  kind  of  stimuli.  Due  to  the  prior  adaptation 
of  the  S-A  system,  it  is  to  be  expected  that  the  learnirg  curve  for  the  final 
discrimination  task  should  show  faster  learning  after  the  period  of  pre¬ 
conditioning  than  if  the  same  discrimination  task  had  been  attempted  with 
the  original  randomly  organized  S-A  network.  In  other  words,  the  S-A 
network  should  become  adapted  to  the  stimuli  of  a  particular  kind  of  universe, 
performing  better  on  typical  discrimination  tasks  involving  "familiar"  kinds 
of  stimuli  than  on  tasks  involving  radically  different  or  "unfamiliar"  kinds 
of  stimuli. 
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14.  SUMMARY  OF  THREE-LAYER  SERIES-COUPLED  SYSTEMS: 


CAPABILITIES  AND  DEFICIENCIES 


The  three -layer  series -coupled  perceptron  (S-*A-*R  perceptron) 
is  the  least  complicated  topological  organization  which  yields  fully  general 
response -capabilities .  The  analysis  presented  in  the  preceding  chapters 
leads,  in  effect, to  the  following  conclusion:  With  a  suitable  design  and 
training  procedure,  a  three-layer  series -coupled  perceptron  can  be  taught 
to  duplicate  the  performance  of  any  finite  automaton.  This  means  that  if  we 
have  a  finite  universe  of  potential  input  sequences  (  )  and 

a  finite  set  of  possible  response  sequences  (  ^ .  .  ■  ,  ),  then  it  is 

possible  to  construct  a  minimal  perceptron  such  that  any  response  sequence, 

■  ,  can  be  associated  with  each  possible  input  sequence,  .  In  order 

to  do  this  with  full  generality,  of  course,  a  suitable  spectrum  of  time  delays, 
j  ,  must  be  present,  as  indicated  in  Chapter  11. 

Both  the  generality  and  the  practical  limitations  of  the  above 
statement  should  be  emphasized.  It  is  perfectly  possible,  in  principle,  to 
teach  a  minimal  perceptron  to  duplicte  the  performance  of  an  arbitrary  digital 
computer.  To  do  this,  every  possible  sequence  of  coded  instructions  and  data 
must  be  represented  as  a  stimulus  sequence  (one  of  the  J;  )  and  the  set  of 
output  numbers  generated  by  the  computer  as  a  response  sequence  (one  of 
the  (4  •  ),  If  the  perceptron  is  large  enough,  it  can  then  be  trained,  with 

an  error  correction  procedure,  to  make  the  appropriate  association  of  input 
and  output  sequences.  But  what  the  perceptron  learns  by  this  process  is  to 
simulate  the  behavior  of  the  digital  computer;  it  does  not  acquire  the 


-303- 


computer's  logic.  If  any  one  of  the  trillions  of  possible  programs  were 
omitted  from  the  training  sequence,  the  perceptron  would  probably  fail  to 
perform  correctly  if  tested  on  the  omitted  sequence.  The  failure  to  genera¬ 
lize,  or  to  learn  logical  rules,  in  such  a  problem  makes  such  an  application 
of  these  minimal  perceptrons  totally  impractical. 

For  practical  purposes,  we  will  limit  our  remarks  to  the 
performance  of  these  perceptrons  in  recognizing  and  reporting  environmental 
events.  In  this  connection,  the  following  capabilities  have  been  established: 

(1)  A  three-layer  series -coupled  perceptron  can  be 

taught  to  associate  an  arbitrary  coded  output,  or  sequence  of  outputs,  , 

to  each  stimulus,  or  stimulus  sequence,  J ,  in  a  finite  environment. 

(2)  The  perceptron  need  not  be  explicitly  designed  for  the 
task  which  it  is  required  to  learn.  The  same  network  may  be  taught  a 
variety  of  alternative  outputs,  or  codifications,  of  the  same  environment. 

(3)  The  required  training  can  be  accomplished  by  means  of 
an  arbitrary  sequence  of  events  from  the  specified  environment,  regardless 
of  the  order  or  frequency  with  which  they  occur,  provided  each  event 
ultimately  reoccurs  in  finite  time. 

(4)  The  training  can  be  accomplished  regardless  of  the 
initial  state  of  the  perceptron 's  memory,  and  without  specifying  in  detail 
the  changes  which  must  take  place  in  the  state  of  the  system  (i.e.,  general 
dynamic  laws  are  sufficient  to  bring  about  the  required  adaptation). 
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(5)  A  perceptron  wilJ  tend  to  assign  the  same  response  to 
any  two  stimuli  or  stimulus  sequences,  J'-  and  J’-  ,  which  are  close  to 

identity  under  temporal  translation.  By  means  of  discrimination  training, 
however,  it  can  be  made  to  associate  a  different  response  to  each  such 
stimulus . 


With  this  kind  of  universality  in  the  performance  of  the  system, 
we  obviously  cannot  hope  to  find  any  new  kinds  of  response  capabilities  in 
more  complex  or  sophisticated  networks,  which  cannot  be  realized  by 
minimal  perceptrons  after  suitable  training.  Nonetheless,  the  three-layer 
series -coupled  perceptron  clearly  falls  far  short  of  biological  systems  in 
some  respects.  The  differences  lie  not  in  what  the  system  can  learn  to  do, 
but  rather  in  the  speed,  efficiency,  economy,  and  reliability  of  learning  or 
adaptation.  An  S  -*-A-*'R  perceptron  can  be  taught  to  play  a  game,  such  as 
checkers,  only  by  teaching  it  what  response  to  make  in  every  conceivable 
situation;  a  biological  system  can  anticipate  most  of  this  training  by 
learning  the  rules  of  the  game.  Or,  similarly,  an  S-*-A-*-R  perceptron  can 
distinguish  a  circle  from  a  triangle  in  the  lower  half  of  its  retina  only  if  it 
has  previously  been  trained  with  triangles  and  circles  in  the  lower  half  of 
its  retina;  it  will  not  generalize  from  experience  with  similar  forms  in  the 
upper  half  of  the  field.  In  Nature,  the  enormous  number  of  sensory  situations 
which  comprise  the  potential  universe  (each  situation,  individually,  having 
exceedingly  low  probability  of  occurrence)  makes  the  capabilities  of 
generalization,  analysis,  and  abstraction  absolutely  essential  for  an 
advanced  organism,  or  recognition  device,  to  function  properly.  Two  main 
ingredients  of  such  performance  are  recognition  of  similarity  and  recogni¬ 
tion  of  functional  parts,  or  entities.  The  first  of  these  is  basic  to  generali- 
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zation  and  induction,  while  the  second  is  basic  to  analysis,  the  abstraction 
of  relations,  and  the  reduction  of  complex  situations  to  familiar  terms. 
Seen  in  this  light,  the  principal  deficiencies  of  these  minimal-topology 
perceptrons  are: 

(1)  An  excessively  large  system  may  be  required. 

(2)  The  learning  time  may  be  excessive. 

(3)  The  system  may  be  excessively  dependent  on  external 
evaluation  (by  an  independent  r  .  c  .  s  .)  during  learning. 

(4)  The  generalizing  ability  (inductive  ability)  is  insuffi¬ 
cient  . 

(5)  Ability  to  separate  essential  parts  in  a  complex 
sensory  field  (analytic  ability)  is  insufficient. 

Point  ( 1 )  is  largely  attributable  to  (5);  the  excessive  size  of 
the  perceptrons  necessary  to  deal  with  complex  environmental  situations 
is  due  largely  to  the  necessity  of  having  a  characteristic  set  of  A-units 
representing  every  possible  sensory  field  or  sequence  in  its  entirety.  A 
preliminary  coding  of  the  field  in  terms  of  its  parts  and  relations  would 
greatly  reduce  the  size  of  the  system  required  to  describe  a  given  universe 
of  situations.  To  take  an  extreme  case,  if  a  three-layer  series -coupled 
perceptron  is  required  to  produce  as  an  output  the  coded  representation  of 
the  sum  of  a  sequence  of  a  million  digits,  it  must  be  capable  of  representing 
in  its  association  system  every  possible  sequence  of  a  million  digits 
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(presented  either  serially  or  simultaneously):  10  possibilities  in  all. 

On  the  other  hand,  a  perceptron  which  could  attend  selectively  to  each 

8 

digit,  form  a  partial  sum,  and  then  go  on  to  the  next  digit,  requires  only  10 

7 

possible  states:  10  to  represent  the  possible  values  of  the  partial  sum, 
m^ultiplied  by  a  factor  of  ten  to  allow  for  each  of  the  possible  incoming  digits. 
The  second  method  is  the  one  employed  by  a  digital  computer,  or  a  man 
adviing  a  sequence  of  numbers.  In  the  field  of  sensory  pattern  recognition, 
similar  conditions  occur.  The  recognition  of  a  sentence  is  made  much 
easier  by  breaking  it  into  words,  and  the  recognition  of  a  scene  is  made 
easier  by  analyzing  it  into  objects  and  relations. 


Similarly,  the  excessive  learning  time  (point  2)  can  be  largely 
attributed  to  (4),  the  insufficient  generalizing  ability  of  the  system.  With 
improved  generalization,  several  examples  should  be  sufficient  to  teach 
the  perceptron  to  recognize  all  members  of  a  class  of  similar  events, 
whereas  at  present  an  unduly  large  sample  is  required  in  order  to  extend 
the  response  over  the  class.  The  insufficient  generalizing  capability  has 
been  frequently  pointed  out  in  the  preceding  chapters,  and  is  common  to 
all  of  the  S  ►A-^-R  perceptrons.  Thus  points  (3),  (4)  and  (5)  appear  to  be 
the  primary  deficiencies. 


In  connection  with  point  (3),  we  note  the  failure  of  minimal 
perceptrons  to  reach  "useful"  terminal  states  under  R-controlled 
reinforcement  procedures,  except  under  exceptional  environmental  and 
organizational  ccnditions.  This  means  that  the  reinforcement  control 
system  must  itself  have  a  great  deal  of  information  about  the  environment, 
and  must  generally  know,  or  have  built  into  it,  the  precise  discrimination 
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or  response  functions  which  the  perceptron  is  supposed  to  learn.  Thus  the 
r  .  c .  s  .  must  either  be  a  free  agent  (e .  g . ,  a  human  trainer)  or  else  some 
kind  of  homunculus  within  the  same  physical  system  as  the  perceptron.  It 
has  been  noted  that  a  perceptron  can  improve  over  the  performance  of  the 
r .  c  .  s  .  in  some  cases  (Section  8.1.4)  but  the  functioning  of  the  r .  c  .  s  .  still 
seems  to  be  rather  remote  from  what  might  be  expected  of  a  biological 
motivating  system.  By  using  a  random-sign  correction  procedure,  the 
information  required  from  the  r.c.s.  is  minimized;  with  such  a  procedure, 
the  possible  outputs  of  the  r.c.s.  can  be  interpreted  to  mean  "hold  steady” 
or  "change”,  while  with  a  directed  correction  procedure  the  three  alterna¬ 
tives  "hold  steady”,  "increase  values”,  or  "decrease  values”  are  all 
required.  But  the  efficiency  of  a  system  employing  the  randomized 
procedure  is  greatly  reduced  (c.f.,  F  igure  1 9 )  and  the  only  hope  for  such 
systems  seems  to  be  in  a  "majority  rule”  procedure,  which  increases  the 
size  and  comple.xity  of  the  total  organization. 

If  a  system  could  be  contrived  which  would  guarantee 
generalization  of  a  response  from  one  stimulus  of  a  class  to  all  other 
stimuli  of  that  class,  an  r.c.s.  which  employs  the  "trial-and-error” 
process  of  the  random-sign  procedure  might  become  practical,  and  a 
simple  motivation  system  which  senses  only  the  suitability  or  unsuitability 
of  the  present  response  or  state  of  the  organism  might  be  substituted  for 
the  more  complicated  r.c.s.  assumed  for  most  of  the  preceding  experi¬ 
ments.  In  Part  III,  it  will  be  shown  that  multi-layer  and  cross -coupled 
perceptrons  are  capable  of  providing  just  this  sort  of  generalizing  capability, 
and,  moreover,  that  this  capability  may  be  "self-organizing”  under 
reasonable  environmental  conditions.  That  is  to  say,  R-  controlled  systems 
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can  learn  to  form  reasonable  classes  on  the  basis  of  a  similarity  criterion, 
provided  there  is  some  support  for  this  organization  from  the  environment. 
The  required  support  takes  the  form  of  a  "continuity  constraint",  which  says, 
in  effect,  that  stimuli  do  not  occur  as  momentary  flashes,  but  are  more 
likely  to  persist  for  a  time,  during  which  they  undergo  a  series  of  move¬ 
ments  or  transformations.  It  will  be  seen  that  such  a  sequential  organization 
provides  sufficient  information  to  enable  a  multi-layer  or  cross -coupled 
perceptron  to  abstract  a  concept  of  similarity  which  can  then  be  employed 
to  obtain  immediate  generalization  in  later  situations  . 

The  improvements  which  have  been  demonstrated  to  date  in 
multi-layer  and  cross -coupled  perceptrons  will  be  seen  to  be  primarily 
in  the  field  of  generalization  phenomena,  and  their  main  virtue  is  in 
reducing  the  learning  time  of  a  perceptron.  Some  reductions  in  size 
requirements  have  also  been  demonstrated,  and  the  dependence  on 
e.xternal  evaluation  of  performance  is  largely  eliminated.  Thus  points  (1) 
through  (4),  in  the  list  of  criticisms  of  minimal  perceptrons  can  be  largely 
or  entirely  eliminated  with  a  multi-layer  or  cross -coupled  topology. 

Point  (5),  however,  remains  the  least  understood  of  the  current  problems. 
While  tliere  is  some  indication  that  perceptrons  of  the  types  to  be  consi¬ 
dered  in  Part  III  may  have  some  analyzing  ability  (for  example,  they  can 
isolate  contours  from  solid  figures,  and  may  possibly  learn  to  suppress 
the  partial  response  of  the  association  system  to  irrelevant  aspects  of  the 
stimulus  field)  it  is  not  yet  possible  to  say  whether  such  systems  are  really 
sufficient  to  meet  the  challenge  of  point  (5),  or  not.  The  psychological 
problems  of  figure -ground  organization,  recognition  of  relations,  and 
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"cognitive  set"  are  all  involved  here.  It  is  likely  that  "back-coupled 
perceptrons",  in  which  R-units  or  deep  association  layers  feed  back  to 
more  superficial  layers,  may  be  necessary  to  deal  with  these  problems. 
Several  possible  approaches  will  be  considered  in  Part  IV,  which  deals 
with  current  problems,  and  attempts  to  establish  directions  for  future 
study. 
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PART  in 


MULTI-LAYER  AND  CROSS-COUPLED  PERCEPTRONS 


15. 


MULTI-LAYER  PERCEPTRONS  WITH  FIXED  PRETERMINAL 


NETWORKS 


The  perceptrons  considered  in  Part  II  have  all  consisted 
of  three  "layers”  of  signal  generating  elements:  a  sensory  layer,  a  single 
layer  of  association  units,  and  a  layer  of  R-units  (containing  only  a  single 
unit  in  the  case  of  simple  perceptrons).  A  perceptron  with  additional  layers 
of  A-units  between  S  and  R-units  will  be  called  a  multi-layer  system.  Thus 
the  network  diagram; 


represents  a  four-layer  series -coupled  system,  whereas  the  diagram 


represents  a  three-layer  cross  coupled  system,  since  all  A-units  are  at 
least  the  same  logical  distance  from  the  sensory  units  (see  Definition  18, 
Chapter  4).  The  three-layer  structure  of  the  second  diagram  can  be  made 
clearer  if  it  is  drawn  in  the  form; 
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which  is  topologically  identical  to  the  preceding  network.  Cross- 
coupled  systems  will  be  considered  in  detail  in  the  following  chapters. 

It  has  been  demonstrated  that  three -layer,  series  coupled 
perceptrons  are  capable  of  learning  any  type  of  classification,  or 
associating  any  responses  to  stimuli  or  to  sequences  of  stimuli,  that 
might  possibly  be  required.  Therefore,  if  a  multi-layer  topology  is  to 
offer  any  functional  advantages,  it  will  not  be  in  the  form  of  new  kinds  of 
responses  to  stimuli  (since  any  such  response  can  be  achieved  with  a 
three-layer  system)  but  rather  in  increased  efficiency  in  the  acquisition 
of  such  responses.  It  can,  in  fact,  be  demonstrated  that  the  adaptability, 
or  ease  of  acquisition  of  responses,  may  be  greatly  improved  with  a 
suitable  multi-layer  topology.  The  most  striking  improvements  are  to 
be  found  in  the  generalizing  ability  of  such  networks  --  an  ability  to  give 
appropriate  responses  to  stimuli  for  which  they  have  not  been  taught.  It 
has  been  seen  that  this  "inductive"  or  generalizing  capability  is  present 
only  in  rudimentary  form  in  three-layer  series -coupled  systems.  Some 
multi-layer  systems  also  .show  improvements  in  sensitivity  to  differences 
between  highly  similar  stimuli,  making  such  discriminations  easier  to 
learn,  as  will  be  seen  in  Section  15.1. 
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In  the  following  sections,  we  will  first  consider  systems  in 
which  all  connections  other  than  connections  to  R-units  have  fixed  values, 
only  the  R-unit  input  connections  being  reinforced.  The  connections  to  the 
R-units  will  be  called  terminal  connections,  all  other  connections  (from  S 
to  A-units,  and  A-units  to  other  A -units )  being  called  preterminal  connections. 
It  will  be  seen  in  Section  15.2  that  the  most  interesting  effects  which  can  be 
obtained  by  such  systems  depend  on  special  constraints  in  the  organization  of 
the  preterminal  network.  The  following  chapter  will  therefore  be  devoted  to 
the  examination  of  dynamic  rules  by  which  the  preterminal  connections 
between  layers  of  A-units  can  be  modified,  so  as  to  yield  the  required  organi¬ 
zations  as  a  result  of  the  system's  adaptive  functioning,  in  a  suitably  organized 
environment. 

The  analysis  of  multi-layer  systems  is  of  interest  not  only  in  its 
own  right,  but  also  because  it  introduces  many  of  the  problems  and  formal 
techniques  of  analysis  which  will  be  encountered  in  the  following  chapters  on 
cross -coupled  systems,  with  feed-back  loops  within  the  network.  In  fact,  it 
is  found  that  with  a  suitable  transformation,  many  "closed-loop"  cross- 
coupled  systems  can  be  represented  by  an  equivalent  "open-loop"  multi¬ 
layer  system. 
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15.  1  Multi-layer  Binomial  and  Poisson  Models 

The  most  straightforward  extension  of  our  previous  models  to 

a  multi-layer  topology  is  to  assume  that  each  A-unit  in  the  first  association 

layer  is  assigned  an  origin  point  configuration  in  the  retina,  or  sensory 

layer,  chosen  independently  for  each  A-unit,  as  before.  Each  A-unit  in  the 

(2) 

second  layer  (designated  A'  )  is  similarly  assigned  an  origin  point  configu¬ 
ration  in  the  A^^^  layer,  independently  for  each  such  A-unit.  In  general, 
every  A-unit  in  the  A'  ^  layer  is  independently  assigned  an  origin  point 
configuration  from  an  appropriate  distribution  (binomial  or  Poisson  model), 
the  connections  originating  from  the  A^'^  layer.  All  connections  from  one 
A-layer  to  the  next  are  assumed  to  be  fixed  in  value,  the  final  A-layer  sending 
variable -valued  connections  to  the  R-units.  In  order  to  analyze  the  perform¬ 
ance  of  such  a  perceptron,  it  is  sufficient  to  determine  the  Q-functions  for 
the  A-units  of  the  last  layer  before  the  R-unit,  since,  given  these  Q-functions, 
we  can  then  apply  the  same  equations  and  analysis  which  were  employed 
in  Ifhrt  II,  for  three -layer  perceptrons.  The  notation  will  be 

used  to  denote  the  Q-functions  for  A-units  in  the  first  layer  (which  are 

'  k) 

identical  with  the  Q-functions  discussed  in  Chapter  6),  and  to 

t  H 

denote  Q-functions  for  units  in  the  4  layer. 

Even  in  the  simplest  case,  of  a  four  layer  perceptron,  the 

combinatorial  analysis  required  for  a  rigorous  statement  of  ^  functions 

is  awe-inspiring.  A  special  case,  in  which  all  inter-layer  connections  are 

{2) 

inhibitory,  and  the  thresholds  of  all  A  units  are  zero,  has  been 
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analyzed  by  Joseph  (Ref.  41),  and  the  reader  is  referred  to  his  contri¬ 
bution  for  the  detailed  considerations.  The  basic  difficulty  stems  from 

(2) 

the  fact  that  a  second  layer  Q-function,  such  as  depends  on  the 

distribution  of  the  numbers  of  A-units  in  the  first  layer  which  respond 
to  5;  alone,  5;  alone,  and  jointly  to  5;  and  Lij  .  The  expected 
values  of  these  numbers  are  obtainable  from  the  ^  functions  in  a 
straightforward  manner,  but  the  non-central  moments  of  the  distributions 
enter  into  the  analysis  in  such  a  way  that  it  becomes  unduly  complicated. 

A  practical  solution  is  obtained  by  assuming  that  the  numbers 
of  A-units  in  the  1st,  2nd,  .  .  .i-1^^^  layers  (designated  by 

^  )are  all  very  large,  or  infinite.  In  this  case, 
the  proportion  of  active  units  in  each  layer  in  response  to  5;  will  be 
equal  to  ,  and  the  expected  values  of  all  set-intersections  can  be 

employed  in  the  analysis.  In  this  case,  the  equations  of  Chapter  6  can 

^  I J  {I  ~  f  ) 

be  employed  without  modification  to  compute  ,,  by  using  Q- 

(l-ij 

in  place  of  the  stimulus  area,  k’l  ,  in  place  of  the  intersection 

,r  ,  etc.  The  error  introduced  by  assuming  infinite  V,  for  the  pre¬ 
terminal  layers  will  be  slight,  as  long  as  the  actual  is  reasonably  large. 

The  addition  of  extra  A~unit  layers  can  have  one  of  several 
interesting  effects,  depending  upon  the  parameters  X  ,  y  ,  and  i~ 

(or  r  ,  ij  ,  and  9  in  a  Poisson  model)  for  each  layer.  The  special 
case  of  inhibitory  connections  and  zero  thresholds  was  investigated  by 
Joseph  (Ref.  41),  who  finds  that  by  optimizing  the  number  of  input 
connections  to  each  layer,  so  as  to  achieve  highest  probability  of  correct 
recognition,  Q-  approaches  a  constant  as  the  number  of  layers  increases, 
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regardless  of  the  size  of  the  stimuli  or  the  dichotomy  which  the  perceptron 

2 

is  required  to  learn.  At  the  same  time,  Qy  approaches  <5-  ,  0y4, 

approaches  Q-^  ,  etc.  In  effect,  this  represents  a  condition  in  which,  in 
the  terminal  association  layer,  a  statistically  independent  set  of  A-units 
responds  to  each  stimulus  in  the  environment.  The  consequence  is  that 
all  discriminations  become  equally  easy.  Specifically,  it  was  found  that 
the  ratio  ---  — yj  for  100  A-units  in  the  terminal  layer  approaches 

1.941  as  the  number  of  layers  is  increased,  with  an  environment  of  40 
stimuli.  A  comparison  with  Table  3,  in  Chapter  7,  shows  that  this 
performance  is  less  than  would  be  achieved  with  a  three-layer  perceptron 
for  the  task  of  discriminating  horizontal  from  vertical  bars,  but  it  is 
considerably  better  than  the  performance  of  a  three-layer  perceptron 
on  a  more  difficult  task,  such  as  the  odd-even  bar  discrimination  illustrated 
in  Table  4.  Thus  the  addition  of  extra  association  layers  can  be  used  to 
improve  discrimination  in  difficult  problems,  but  only  at  the  cost  of  reduced 
generalizing  ability,  since  two  adjacent  stimuli  with  a  large  intersection  are 

;  I 

now  no  more  closely  related  (in  the  4  layer)  than  two  totally  disjoint 
stimuli . 


In  Joseph's  model,  with  all  inhibitory  connections,  the  above 
results  are  obtained  only  by  optimizing  the  number  of  connections  to  each 
new  layer  of  A-units.  If,  instead  of  carrying  out  this  optimization,  a  fixed 
number  of  connections  is  assumed  for  all  A-units  in  the  system,  the 
perceptron  will  be  unstable,  and  will  tend  to  develop  oscillations  such  that 
alternate  A-layers  are  totally  "on"  or  totally  "off",  making  all  discrimi¬ 
nation-impossible.  Moreover,  it  is  to  be  expected  that  a  model  which  has 
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been  optimized  for  one  environment,  with  a  given  size  of  stimuli,  will  be 
unstable  in  a  different  environment,  with  a  slightly  different  size  of  stimuli. 

In  more  practical  cases,  a  mixture  of  excitatory  and  inhibitory  connections 
must  be  used,  with  thresholds  greater  than  zero,  in  order  to  guarantee 
stability  and  convergence  of  Q-  for  a  range  of  environmental  variations. 
Clearly,  ii  X  <  y  h  S  ,  will  not  go  to  1  as  increases.  If  x  =  y  , 

a  suitable  choice  of  6  >0  will  generally  guarantee,  as  well,  that  Qi 
will  not  go  to  zero.  From  Figure  7(b),  for  example,  it  is  clear  that  if 
X  =  y  -■=  5  ,  and  6-1  ,  an  equilibrium  should  occur  at  about  Q-  =  .37  , 


since  at  this  point  Q-  '  •  If  Qi'  '' 

(■^  ’  'A  -  /  {A  -  i) 

we  will  have  Qi  <  Qi'  '  ,  while  if  p;  ■ 


{A  -  I) 


U-t) 


should  rise  above  .37, 
falls  below  .  37  we 


will  have  >  V;  .  If  we  increase  the  amount  of  inhibition  by 

making  x  -  ,  y  -  "  ,  then  (from  the  same  Figure)  we  find  that  the 

equilibrium  value  of  0;  is  reduced  to  .  14.  If  the  inhibition  is  increased 
still  further  (e.g.,  to  <’  /.  /  9  ,  as  in  the  bottom  curve  of  Fig.  7b) 

the  equilibrium  value  of  F';  is  zero,  and  no  matter  how  large  a  stimulus 
is  presented,  activity  will  die  away  entirely  in  the  "deeper"  association 
layers . 


(A)  /) 


This  observation  will  generally  not  be  valid  for  a  small  perceptron,_ 
where  the  actual  level  of  activity  may  go  to  zero  in  one  of  the  layers, 
due  to  random  variations  in  the  network.  In  this  case,  will  be 

zero  for  all  subsequent  layers.  Thus,  for  a  finite  system,  Q;'  J 
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15.2  The  Concept  of  Similarity-Generalization 


So  far,  the  addTion  of  extra  association  layers  has  had  no 
important  effect  beyond  the  sharpening  of  the  discriminative  acuity  of  the 
perceptron,  generally  counterbalanced  by  a  loss  in  the  generalizing  capa¬ 
bility  of  the  system.  In  the  next  section,  we  will  consider  a  four-layer 
perceptron  with  special  constraints  in  the  organization  of  the  connections 
to  the  A-units,  such  that  the  system  tends,  spontaneously,  to  generalize 
a  response  associated  to  a  given  stimulus  pattern  to  all  "similar”  stimuli, 
regardless  of  their  location  in  the  retinal  field.  In  the  following  chapter,  it 
will  be  shown  that  such  constraints  need  not  be  built  into  the  system  ab  initio, 
but  can  arise  through  a  spontaneous  adaptation  process  (without  any  inter¬ 
vention  by  the  r.c.s.  )  if  some  simple  dynamic  laws  are  introduced..  In  all 
of  these  systems,  the  concept  of  "similarity"  is  of  fundamental  importance. 

The  term  "similarity"  has  been  used  in  a  number  of  different 
ways,  some  of  them  well-defined,  as  in  "two  triangles  are  similar",  some 
relatively  vague  and  ambiguous,  as  in  "two  faces  are  similar"  or  "two  ideas 
are  similar".  For  present  purposes,  we  have  need  of  a  concept  which  will 
cover  the  range  of  relationships  which  might  make  two  objects  appear 
"similar"  to  a  perceiving  observer,  but  which  will  still  permit  exact 
definition  for  purposes  of  analysis.  We  must  also  distinguish  between 
the  "objective  similarity"  of  objects  in  space,  the  similarity  of  stimuli 
on  the  retina,  and  the  "subjective  similarity"  which  the  observer  recognizes 
and  reports.  While  the  concepts  proposed  here  do  not  cover  all  of  the 
possible  meanings  of  "similarity"  in  psychology,  they  are  sufficient  to 
permit  the  design  of  a  number  of  perceptual  experiments  related  to  the 
similarity  problem. 
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15.2.1  Similarity  Classes 


We  will  first  consider  a  definition  of  similarity  which  is 
applicable  to  the  classification  of  stimuli.  From  this  point  of  view,  two 
stimuli  either  are  similar  or  they  are  not;  there  are  no  intermediate  degrees 
of  similarity.  In  the  following  section,  a  quantitative  definition  which  per¬ 
mits  a  multidimensional  ordering  of  objects  or  stimuli  according  to  their 
similarity  will  be  considered. 

For  present  purposes,  the  only  constraints  which  will  be  placed 
on  the  logical  nature  of  the  similarity  relation  are  that  it  should  be 
symmetric  and  reflexive;  that  is,  if  A  sim  B,  then  B  sim  A,  and  A  is 
always  similar  to  itself.  It  is  not  required  that  the  relation  of  similarity 
should  be  transitive;  that  is,  A  sim  B  and  B  sim  C  does  not  imply  A  sim  C, 
except  under  very  special  conditions,  as  will  be  seen  below.  There  are 
clearly  a  large  number  of  possible  relations  which  meet  the  logical  conditions 
for  a  similarity  relation.  For  example,  equality,  geometrical  congruence, 
equality  of  area,  and  topological  equivalence  are  all  admissible  possibilities. 
Thus,  in  specifying  the  similarity  of  two  stimuli  the  notation  A  sim  B  | 
will  be  used,  where  is  a  particular  relation,  meeting  the  conditions 
of  symmetry  and  refle.xivity . 

The  set  of  stimuli  which  are  similar  under  a  given  relation 
will  be  said  to  form  a  similarity  class  under  that  relation.  For  e.xample, 
if  /f^  is  defined  as  the  relation  of  similarity  under  a  rotation  group,  then 
A  sim  b|  /■  means  that  A  is  a  rotated  image  of  B,  and  B  is  a  rotated 
image  of  A . 
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In  perceptual  problems,  a  particular  kind  of  similarity  class 
is  of  particular  importance.  This  will  be  called  a  projective  similarity 
class,  and  is  defined  as  follows.  Let  the  sensory  points  of  a  perceptron 
be  embedded  in  an  r-dimensional  sensory  manifold,  J  .  Let  J  be 
embedded  in  an  r -t- ^  dimensional  world  manifold,  972  .  An  object  in  972 

5j< 

is  defined  as  any  set  of  points  in  97Z  .  Let  jT  be  a  set  of  admissible 

objects  in  -7/1  .  Let  <£7  be  any  transformation  group  in  97Z  .  Let  a 

projection  TT  be  defined  as  an  operation  which  maps  every  point  in  972 
into  at  most  one  point  in  J  .  Then  A  sim  J ,  Cl  ,  TT  means 

that  stimuli  A  and  B  are  both  TT  -projections  onto  the  sensory  points  in 
jJ  of  transforms  under  sTT  of  the  same  object  in  fl 

i  ' 

A  few  moments  reflection  should  show  that  this  encompasses 
most  of  the  cases  in  which  we  say  that  two  stimuli  are  perceptually 
"equivalent";  for  example,  any  group  of  rigid  movements  of  an  object  in 
3 -space  will  yield  a  projective  similarity  class  on  a  two-dimensional 
retina.  Note  that  this  similarity  relation  is  not  generally  transitive.  For 
example,  if  we  let  >2^  be  the  group  of  rigid  motions  in  3-space,  and  let 
=  2  ,  then  the  similarity  classes  generated  by  a  flat  cut-out  of  a 

square  in  9?z  ,  and  by  a  cube  in  972  (with  orthogonal  projection  onto  the 

retina)  are  related  by  the  Venn  diagram; 


The  term  "object"  is  used  in  much  the  same  sense  as  "distal  stimulus" 
in  psychology.  Our  use  of  the  term  . "stimulus"  always  signifies  a 
"proximal  stimulus"  unless  otherwise  specified. 
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where  the  intersection  includes  all  cases  where  the  square  and  a  face  of 
the  cube  are  both  parallel  to  the  retinal  surface  (assuming  J  to  be 
Euclidean,  which  it  is  not  in  a  vertebrate  eye).  A  tilted  square  will  be 
projected  as  a  parallelogram,  whereas  a  tilted  cube  is  projected  either 
as  a  rectangle,  pentagon,  or  hexagon,  so  that  the  classes,  although  they 
intersect,  are  not  equivalent. 

For  the  special  case  in  which  the  points  of  an  object  and  all  of 
its  transforms  in  ??Z  can  be  placed  in  one-to-one  correspondence  with  the 
S -points  in  J  ,  the  relation  of  projective  similarity  will  be  transitive. 

This  includes  the  case  in  which  %  and  ^  are  of  the  same  dimensionality 
and  coextensive,  objects  and  transforms  consisting  only  of  sensory  points  in 
9’:  .  Most  stimulus  classes  considered  in  experiments  up  to  this  point  have 
been  interpretable  in  this  fashion.  Alternatively,  ‘/9Z  might  have  a  higher 
dimensionality  than  ^  ,  but  the  group  may  be  limited  to  motions 

parallel  to  the  surface  of  .  Here  again,  with  a  suitable  choice  of  ^  , 
a  transitive  similarity  relation  can  be  obtained. 

The  case  of  greatest  psychological  interest  is  that  of  a  three- 
dimensional  world-manifold,  ’/  ,  and  a  two-dimensional  sensory  manifold, 

5/  ,  where  s'?-'  is  the  group  of  rigid  motions  and  dilatations  in  ‘'TTL  .  A 

perceptron  which  generalizes  strongly  between  any  two  members  of  a 
similarity  class  defined  by  such  a  relation,  and  generalizes  weakly  between 
stimuli  which  are  not  in  the  same  similarity  class,  will  duplicate  a  large 
fraction  of  the  perceptual  behavior  of  a  biological  organism,  in  the  visual 
domain , 

*  A  consideration  of  some  of  the  projection  operations  which  apply  to 
this  problem  can  be  found  in  Gibson,  Olum,  and  Rosenblatt,  Ref.  27. 
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15.2.2  Measurement  of  Similarity,  Objective  and  Subjective 

Let  be  a  Lie. -group  (of  dimension  r  )  of  transformations 
of  the  manifold  .  Let  B  be  a  canonical  system  of  coordinates  defined  in 
the  Euclidean  f"  -space,  E  ,  such  that  every  system  of  equations 
g^it)  =  (2/  t  (where  is  the  / coordinate  of  g  in  B)  gives  a  one- 

pai'ameter  subgroup  g  (t)  .  Then  the  distance  d{0,g)  for  any  g€{^) 

(9  ^  •  93  ’  ■  ■  ■  ’ 

- 

We  then  define  the  similarity  measure  ^3L  ( X ,  Yj\l/,B  for  the  objects  / 
and  Y  with  respect  to  sb  and  B  as 

7c/ 

r  . 

where  E  [g-  ^  ~  '7*'/  >  Q  ^  b/  (That  is,  E  is  the  set  of  all  trans¬ 
formations  in  rb  whicli  will  transform  the  object  >'  into  the  object  X  .  ) 

Note  that  this  measure  is  applicable  only  to  objects  in 
which  are  similar  under  ;  it  is  not  applicable  to  stimuli  unless  J 
is  coextensive  with  9??  .  Consequently,  the  measure  /u-  will  be  called 

the  objective  similarity  measure  with  respect  to  bx  and  B.  This 
measure  represents  the  length  of  a  sort  of  "shortest  path"  by  which  / 

^  Readers  who  are  unfamiliar  with  the  theory  of  Lie -groups  will  find  a 
useful  discussion  of  this  subject  in  Pontrjagin  (Ref.  111). 
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can  be  continuously  transformed  into  /  ,  by  means  of  transformations 

of  the  group  .  The  choice  of  the  basis,  B  ,  determines  the  relative 
weighting  attached  to  various  subgroups  of  .  For  example,  if  is 

the  group  of  translations  in  971  ,  then  /x  can  be  made  proportional  to  the 
length  of  the  displacement  vector  which  would  carry  Y  into  a  . 

Let  us  also  define  the  subjective  similarity  measure  with 
respect  to  a  perceptron,  P  ,  a  re.sponse  unit,  B  ,  and  a  projection 
operator  U  ,  by 


( 


a 


(p'  ^  t 


(15.2) 


where  is  the  value  of  ■  for  the  stimuli  corresponding  to  the 

objects  ■  and  (under  the  projection  Ff  )  measured  in  the  source  set 
of  the  response  unit  a  .  For  an  -system,  and  stimuli  of  fixed  size, 

'  •  .  I  is  proportional  to  the  generalization  coefficient  ,  for  the 

response  >  .  For  two  identical  stimuli,  /'  '(>  ,)-  /  .  If  the  value 

of  /  '  '/',  /  >  is  a  monotonic  function  of  the  objective  similarity  of  the 
objects  /  and  Y  ,  we  would  expect  the  response  r"  to  generalize  most 
strongly  to  highly  "similar"  objects,  and  most  weakly  to  dissimilar  objects. 
Over  any  given  subgroup  of  transformations  of  an  object  in  ‘7/2  ,  this 

induces  a  "generalization  gradient"  equivalent  to  the  use  of  the  term  in 
experimental  psychology. 


A  perceptron  which  is  to  simulate  perceptual  performance 
must  have  or  acquire  a  close  correlation  between  the  subjective  and 
objective  similarities  of  objects  in  physical  space,  under  the  group  of 
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rigid  motions  and  some  kinds  of  cintinuous  deformation,  A  perceptron  in 
which  such  a  correlation  exists  is  said  to  be  capable  of  similarity  generali¬ 
zation.  Similarity  generalization  implies  that  the  perceptron  not  only  tends 
to  generalize  to  similar  objects,  but  retains  its  ability  to  respond  differen¬ 
tially  to  dissimilar  objects.  The  demonstration  of  such  a  capability  will  be 
our  main  concern  for  the  remainder  of  this  chapter  and  the  following  four 
chapters . 

15.3  Four-Layer  Systems  with  Intrinsic  Similarity  Generalization 
15.3.1  Perceptron  Organization 


The  four-layer  perceptrons  to  be  analyzed  have  fixed  connections 
except  for  the  terminal  A  to  R-unit  connections,  and  a  topology  which  is 
illustrated  in  Figure  40  .  S,  A,  and  R-units  are  all  assumed  to  be  of  the 
simple  variety,  resembling  those  of  an  elementary  perceptron.  The 
special  features  of  this  system  (which  might  be  called  a  "similarity- 
constrained  perceptron")  are  the  following: 

n,) 

(1)  Each  A  unit  has  a  threshold  9  ,  X  excitatory  and 
y  inhibitory  input  connections,  and  a  single  output  connection  to  one  of  the 
A  units  . 

(y) 

(2)  Each  A  unit  receives  connections  from  a  source 

dj 

set  of  frt  A  units,  and  has  a  threshold  equal  to  1  . 
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t  t  t 

VALUES  -  +  I  VALUES  =  +  I  VARIABLE 

VALUES 


Figure  40  ORGANIZATION  OF  A  SIMILARITY-CONSTRAINED  PERCEPTRON  {x  -  2,  y --  I, 
m  =  3).  TRANSLATION  GROUP  IN  TOROIDAL  RETINA. 
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(3)  The  values  of  all  connections  from  to 

units  are  equal  to  +1. 

(4)  All  ^  units  in  the  source  set  of  a  given  A\^  ^ 
unit  have  origin  point  configurations  which  are  members  of  a  similarity 
class,  under  some  similarity  relation  A?- 


The  subsequent  discussion  will  be  limited  to  the  special  case 
in  which  the  similarity  relation  ^  is  equivalent  to  similarity  under  a 

'f 

transformation  group,  ^  ,  in  the  sensory  space  of  the  perceptron.  This 

means  that,  when  an  origin  configuration  has  been  picked  for  one  of  the  A 

jy.’  (!) 

units  connected  to  a  given  4  unit,  the  remaining  972-/  A  units 
connected  to  the  same  ^  unit  must  have  origin  configurations  which 
are  transforms  under  ^  of  the  first  coirfiguration  selected.  This  is 
illustrated  in  Fig.  40  for  a  case  in  which  '/I  '  j'  ,  and  the  transformation 
group  is  the  group  of  horizontal  and  vertical  translations  on  the  retina. 

In  the  model  to  be  analyzed,  it  is  assumed  that  a  single  template  configuration 

is  chosen  at  random  for  each  unit,  and  the  rn  origin  configurations 

(!) 

actually  assigned  to  the  A  units  are  obtained  by  selecting  m  transform¬ 
ations  at  random,  without  replacement,  from  the  group  siA  .  This  yields 
the  auxiliary  condition  that  no  two  ^  units  in  the  same  source  set  have 
identical  origin  point  configurations. 


In  the  case  considered  here,  the  world  manifold  9/z  and  the  sensory 
space  are  taken  to  be  coextensive,  with  a  one-to-one  correspondence 

between  objects  in  9n.  and  stimuli  in  ;J’ 
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15.3.Z  Analysis 


To  begin  with,  we  will  attempt  to  provide  an  intuitive  basis 
for  understanding  the  functioning  of  the  similarity-constrained  perceptron. 
At  one  extreme,  if  ^  -  /  ,  note  that  the  system  becomes  functionally 

equivalent  to  an  elementary  perceptron  of  the  binomial  variety,  with  A-units 
having  the  same  parameters  as  the  A  ^  units  in  the  4-layer  model.  At 

the  other  extreme,  where  rn  is  equal  to  the  order  of  the  transformation 

(!) 

group,  there  is  one  A  unit  in  each  source  set  for  every  possible  trans¬ 
form  of  the  "template  configuration".  Now  if  one  of  the  units  whose 

origin  configuration  is  ul)  responds  to  a  stimulus  ,  any  transform 

^ ~  X  ^  necessarily  activate  the  A'  '  unit  whose  origin  configuration 

is  the  transform  T((jj)  .  Since  both  of  these  A^^'^  units  are  connected 
to  the  same  unit,  this  unit  will  respond  both  to  5-^  and  j  , 


since  its  threshold  is  1,  and  the  values  of  the  connections  from  A 


to 


.  ^2  J 

A  unit 


A  units  are  fixed  at  1.  Thus  we  have  the  rule  that  any 
which  responds  to  a  stimulus  will  also  respond  to  all  transforms 

under  the  group  »<'■'  .  Alternatively,  we  could  state  that  if  ' sim  5^  ^ 

and  an  A  unit  responds  to  5;^-  ,  then  this  unit  will  also  respond  to 
.  Next  suppose  that  in  addition  to  making  m  equal  to  the  order  of  the 
group,  the  threshold  of  the  A''  units  is  9  =  number  of  excitatory 
origins  =  area  of  the  stimuli,  and  the  number  of  inhibitory  origins  is 

equal  to  the  complementary  area,  so  that  an  A  unit  will  respond  to 

'2) 

only  one  stimulus.  We  then  have  an  ideal  situation,,  in  which  an  A  unit 
responds  to  all  the  members  of  a  given  similarity  class,  and  only  to 
members  of  that  similarity  class.  Under  these  conditions,  if  we  show  the 
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perceptron  a  stimulus,  say  a  square,  and  associate  a  response  to  that 
square,  this  response  will  immediately  generalize  perfectly  to  all 
transforms  of  the  square  under  the  group  ,  and  will  not  generalize  at 
all  to  any  stimulus  which  is  not  a  transform  of  the  square  under  . 

The  conditions  considered  above,  where  m  is  equal  to  the 
order  of  the  group,  and  each  /  unit  responds  to  only  one  possible 
stimulus,  are  impractical  in  the  extreme,  for  a  retina  of  reasonable 
size.  It  should  be  clear  from  the  above  arguments,  however,  that  even 
with  smaller  values  of  m  (so  long  as  rn  >  /  )  and  lower  thresholds,  a  bias 
will  exist  for  an  A  unit  to  respond  to  similar  stimuli,  rather  than 
dissimilar  stimuli,  under  the  group  .  V/e  now  pass  on  to  a  quantitative 
analysis  of  the  performance  of  this  system,  first  for  an  environment  of 
random  "salt-aind-pepper"  stimuli,  and  then  for  an  environment  of  square 
stimuli . 


The  performance  of  a  four-layer  perceptron  of  the  type  under 

consideration  can  be  obtained  from  preceding  analyses  of  elementary  per- 

(2) 

ceptrons  if  we  know  the  G-matrix  or  the  Q-functions  of  the  A  —units.  The 

expected  performance  of  the  system  (or  the  actual  performance  of  a  very 

(2) 

large  system)  is  entirely  determined  by  the  functions  Q--  ,  i.e.  ,  the 

probability  that  a  second-layer  A-unit  will  respond  both  to  5;  and  to  S' 
We  will  consider  the  case  of  a  perceptron  with  sensory  points,  and 

a  universe  of  random  dot-stimuli,  each  consisting  of  ^  sensory 

points  chosen  at  random  from  a  uniform  distribution.  Let  T  be  any 
transformation  in  ,  such  that  the  measure  of  the  set  of  fixed  points 
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under  the  transformation  is  zero.  We  will  use  the  notation  5;'  to 
denote  the  transform  T(Si]  ,  and  to  denote  some  other  transform 

T*'(Si)  ,  (T*  ^  T )  .  With  this  notation,  is  the  probability  that 

an  unit  responds  to  5;  and  to  T(Si)  ,  and  is  the  probability 

that  it  responds  to  5;  and  to  T  (S;) 


First  of  all,  we  have 


(15.3) 


where  =  conditional  probability  that  an  A  unit  responds  to  5;' 

given  that  it  responds  to  $1  •  For  the  first  factor  of  this  expression,  we 
have  the  close  approximation 


(!) 


m 


(15.4) 

This  approximation  assumes  that  the  m  units  connected  to  an  unit 

all  have  an  independent  chance  of  responding  to  stimulus  5;  •  This  will  be 

approximately  true  if  0  <<  for  the  A'  ^  units.  In  this  case,  since  the 
stimuli  consist  of  random  point  configurations,  the  knowledge  that  an  origin 
point  of  the  first  unit  falls  on  an  active  S-point  still  leaves  n^-  i 

possible  S -points  in  the  same  stimulus,  any  one  of  which  might  coincide 
with  the  transform  of  the  origin  point  for  one  of  the  other  A^  ^  units.  In 
the  range  of  parametric  conditions  with  which  we  are  generally  concerned, 
equation  (15.4)  approaches  a  perfect  equality. 
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For  the  second  factor  in  (15.3)  we  have  the  approximation 

(  D 

(which  is  accurate  for  small  '  ) 


m  -  / 
(jO  ~  I 


/  m  -  1  \ 

/  (!) 

{‘ - T 

l-{l-Qy\r) 

\  fji)  -  1  J 

(15.5) 


where  uj  is  the  order  of  the  group  si/  .  The  first  term  of  this 
expression,  ^  ,  is  the  probability  that  one  of  the  rn- I  units, 
other  than  the  one  which  is  known  to  have  responded  to  ,  has  an  origin 
configuration  which  is  a  T  -transform  of  •'he  configuration  of  the  "known" 
A-unit.  There  are  m-l  non-identical  possibilities  that  this  transform  is 

present,  and  m-I  transforms  from  which  they  are  chosen.  If  this  condition 

'  2  ; 

is  met,  then  the  A  unit  must  certainly  respond  to  Ti  S;,)  .  If  this 
condition  is  not  met,  with  probability  /  -  |  ,  it  is  still  possible  that  one 

of  the  .4^  ^  units  responds  to  T(Si)  ,  and  this  probability  is  given  by  the 

last  term  of  the  above  e.xpression.  Here  is  the  probability  that  an 

n,  * 

A  unit,  which  is  known  to  respond  to  some  transform  T  (S;)  will  also 

respond  to  ^  Since  ^  may  be  any  transformation  (including  the 

identity)  so  long  as  it  is  not  equal  to  T  ,  all  of  the  m  4^^^  units  are 

equally  good  candidates  for  such  a  response.  Specifically,  for  the  case 

under  consideration. 


n  , 


(15.6) 
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where  -  the  number  of  common  sensory  points  in  5,''  and  S[* 
with  probability 


-  n. 


P 


ns  -J 
Ns- I 


(15.7) 


Note  that  the  probability  p  that  a  point  in  Si*  is  in  the  common  area  is 
based  on  -  /  possible  locations,  since  it  cannot  occupy  the  location  of  its 
transform  in  .  ■'  ;  however,  there  are  !  other  points  in  5/'  whose 

locations  it  might  occupy.  The  only  quantity  which  we  still  lack  is 

I’D 

which  is  given  by 


it' 

A,'  -  P-'.c)  _  -Pj  ' 

'h 

■  I  ' , 


where  Pj-'C/  is  computed  from  Equation  (6.5)  with  C  -  . 

Substituting,  we  have 


(15.8) 
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Note  that  as  A/^  ,  the  number  of  retinal  points,  goes  to  infinity,  (with 

'Ya  constant)  this  quantity  approaches 


Q: 

which  is  equal  to  Q-  for  the  binomial  model.  At  the  same  time,  the  first 
term  of  (15.5)  goes  to  zero  if  m  remains  finite  and  the  order  of  the  group 
increases  with  the  number  of  possible  retinal  locations  of  the  stimulus. 
Thus,  for  an  infinite  retina  and  a  transformation  group  of  infinite  order, 
we  have 


'2) 


(15.9) 


and 


Q- 


^1,  ,  m 


(15.10) 


(2} 

which  is  identical  to  the  expression  for  0;  :  for  a  pair  of  random, 
unrelated  stimuli.  Thus,  with  an  infinite  retina,  no  additional  generalization 
is  to  be  expected  from  a  random  stimulus  to  its  transform  under  the  conditions 
assumed  above.  For  a  finite  retina,  however,  (or  for  a  finite  group  ) 

we  have  the  inequality 


(?)  (?) 

Q..'  >  (j.- 

a  L,j 


due  to  the  effect  of  the  first  term  in  equation  (15.5). 
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Let  us  now  turn  to  a  modification  of  the  above  problem,  in 
which  the  environment  consists  of  square  patterns  with  edges  alligned 
in  a  square  (toroidal)  retina,  and  the  group  consists  of  all  possible 
translations.  In  particular,  we  will  take  the  transformation  T  to  be 
a  lateral  translation  by  half  the  width  of  the  retina.  The  notation  5;' 
will  be  used  for  T{Si)  ,  and  r'  will  be  taken  to  mean  any  transformation 
in  -liJ'  not  equal  to  T  and  not  equal  to  the  identify  transformation.  For 
convenience,  we  restrict  the  area  of  the  stimuli  so  that  R  ^  .25.  This 
guarantees  that  5/  and  are  always  disjoint  patterns .  is 

again  assumed  to  be  small.  In  this  case  we  have,  in  place  of  (15.5) 


(^) 


m  -  I 
—  y- 

oj  -  I 


/  m  -  I  \  \  /  ^  ^ ^  \  ^ ^  \  Z'  ^ ^ ^  \  ^ 


where  the  expectation  is  with  respect  to  selections  of  transformations 
such  that  T-(Z;j  -  J/. 


To  avoid  the  computation  of  this  expectation,  we  make  the 

further  approximation  that  the  expectation  of  the  product  of  the  above 

sequence  of  Q-functions  is  equal  to  the  product  of  the  expected  values  of 

the  Q-functions.  Now  it  can  be  shown  that  for  any  distribution  of  , 

'  ,/ 

EU'l-Zj  E  TTLJ-Q)  -  TTCi-EE) 


It  follows  from  this  that  the  approximation  which  we  now  propose  to  make 
will  be  a  consei'v.ative  one,  yielding  values  of  Qi'\i  which  are  slightly 
smaller  than  they  should  be.  With  this  approximation,  we  now  have: 
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m  -  / 

(O-'l  •  ^  -  V- 

I  ‘  (.0-1 


/  m  -  !  \  /  (!)  ^  \ 


(15.11) 


since  the  "known”  A  unit  which  responds  to  j-  has  the  conditional 
probability 


(I) 

'•M  ■  (0)  = 


D- 


of  responding  to  the  disjoint  transform  S-'  ■  The  expression  for 

'D 

is  again  given  by  (15.6),  only  the  probability  P(n^)  is  different 
from  the  random  stimulus  case.  A  general  equation  for  will  not  be 

developed  here,  for  a  finite  retina;  in  particular  cases,  it  is  obtained  by 
counting  all  of  the  possible  ways  in  which  a  square  and  its  translate  can 
intersect  to  yield  n.  common  points.  Some  numerical  examples  will  be 
considered  in  the  following  section.  Note  that  the  modification  from 
Equation  (15.5)  to  (15.  11)  will  have  the  effect  of  tending  to  diminish  the 
value  of  Y-'  for  small  values  of  ni  ,  so  that  for  m  -  I  the  generalization 
to  a  disjoint  square  will  always  be  less  than  the  generalization  from  a 
square  to  a  random  stimulus  of  the  same  area,,  which  is  still  given  by 


r  / 


(15. IZ) 
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If  we  go  to  the  limit  of  an  infinite  retina,  (and  infinite  trans¬ 
formation  group)  with  the  environment  of  square  stimuli  just  considered, 
the  results  differ  considerably  from  the  random  stimulus  case.  The 

difference  is  due  to  the  distribution  of  the  common  area,  C  ,  which,  in 

2 

the  case  of  the  random  stimuli,  went  to  k  with  probability  1  .  In  the 
case  of  randomly  placed  square  stimuli,  the  probability  of  a  zero  inter¬ 
section  in  an  infinite  retina  is  given  by 

4/2'^ 

l^iC  =  O)  -  /  (15.13) 

where  ^  -  length  of  edge  of  square, 
r  =  width  of  retina  (r  -  24  . 


2 

The  probability  of  C  <  f  4  will  be  4  r  times  the  area  under  the 
hyperbola  7  -  /  from  y  -  0  to  4  .  Specifically, 


P' :  <  (  an 


4 

f  ■ 


/ 


rf  /  -h 


'f : 
r ' 


/  -  7  fi’  - 


r 


.  4  ^ 

/  -f-  ,  —  I 

^  H 


Differentiating , 


P[r  -  0 


Ir 


17 


(15. 14) 
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Thus,  for  a  square  stimulus  of  area  in  a  retina  of  area  1  ( R 


we  have 


/  P{c)  q['!(c)  dc  +  a -4R)  q!I'(o) 

t  *■  u  I  '■J  '•J 


(15.15) 


Substituting  this  in  ( 1 .5  .  1 1 }  yields  an  expression  for  Qi'\i  for  the  infinite 
retina,  and  V,',-  can  be  computed  by  (15.3),  as  usual. 


15.3.3  Examples 


Figure  41  illustrates  the  behavior  of  a  similarity-constrained 
perceptron,  as  a  function  of  n  ,  for  various  combinations  of  retinal 
size  and  types  of  stimuli.  The  transformation  group,  in  each  case, 
consists  of  all  horizontal  and  vertical  translations  in  a  square,  toroidally 
connected  retina.  The  stimuli  considered  are  a  pair  of  independent 


random-dot  stimuli, 
forms  i,,'  ,  3'  ' 


,  a  square  stimulus  ,  and  the  trans¬ 


forms  ,  where  the  transformation  employed  is  a  shift  of 

half  the  width  of  the  retina.  This  guarantees  that  the  square  stimulus 

r 

is  disjoint  from  its  transform  5^'  .  All  stimuli  have  an  area  R  equal 

to  one  fourth  of  the  retina.  The  parameters  of  the  ^  units  are 

/  -  V  --  4  ,  0'  2  . 
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The  bottom  solid  curve  provides  a  baseline,  with  which  the 
other  conditions  can  be  compared.  This  curve  is  identical  for 
(both  stimuli  random  and  independent),  random  stimulus  and  its 

transform)  where  A/'^  is  infinite,  and  (a  square  stimulus  vs.  a 

random  stimulus).  In  a  small,  finite  retina  however  (specifically,  with 
=  36)  a  random  stimulus  will  generalize  more  strongly  to  its 

transform  than  to  an  independent  random  stimulus,  for  any  rn  >  I  . 

This  is  shown  by  the  upper  of  the  two  solid  curves.  The  broken  curves 
illustrate  the  generalization  from  a  square  to  its  (disjoint)  transform,  both 
for  the  6  by  6  retina,  and  for  the  infinite  retina.  In  both  cases,  we  find 
that  the  system  generalizes  more  strongly  to  a  random  stimulus  if  m  is 
small,  but  that  as  rn  is  increased,  the  perceptron  begins  to  generalize 
more  strongly  to  the  disjoint  transform  than  to  a  random,  unrelated 
stimulus.  For  the  infinite  retina,  the  cross-over  occurs  between  m  =  4 
and  r-n  =  S  .  This  means  that  for  a  ^’-system,  with  m  ^  5  ,  n--  will 

be  positive  from  a  square  to  any  other  square,  and  will  be  zero  from  a  square 

:l) 

to  a  random  dot  stimulus.  Increasing  the  threshold  of  the  A  units  will 
reduce  A •  •  for  all  curves,  but  will  increase  the  relative  bias  towards 
similar  stimuli,  and  will  shift  the  cross-over  point  further  to  the  left  for 
the  U'  /  curves. 

The  difference  in  performance  for  squares  as  opposed  to 
random  stimuli  will  tend  to  be  characteristics  of  any  coherent  stimulus 
patterns,  provided  the  transformation  group  is  one  which  preserves  the 
coherence,  or  compactness,  of  the  stimuli.  This  may  be  puzzling  to 
some  readers  who  recognize  that  under  the  connection  rules  employed 
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in  these  perceptrons,  there  is  nothing  unique  about  topologically  connected,, 
or  continuous  regions,  which  would  affect  the  perceptron's  ability  to 
recognize  them  in  any  different  way  than  disconnected  regions.  It  is, 
after  all,  only  the  set  of  points  to  which  connections  happen  to  be  made 
which  determines  the  response  of  a  perceptron,  and  if  every  S-unit  were 
randomly  interchanged  with  some  other  S-unit,  a  corresponding  change 
being  induced  in  the  stimulus  environment,  the  performance  of  the 
perceptron  should  not  be  affected  at  all.  This  will  indeed  be  true, 
provided  any  transformation  group  employed  in  the  first  perceptron  is 
replaced  by  a  new  transformation  group  corresponding  to  the  rearranged 
retina.  The  essential  feature  of  coherent  stimuli  with  a  group  of  coherence¬ 
preserving  transformations  is  that  the  probability  distribution  of  stimulus - 
intersections  does  not  concentrate  at  the  expected  value  of  the  intersection, 
as  /A  and  the  order  of  the  group  become  infinite.  This  permits  a 
similarity  bias  to  be  maintained  for  such  stimuli  which  cannot  be 
maintained  for  random  stimuli.  Any  group  generated  by  a  permutation 
operation  on  the  points  of  the  retina  will  have  the  same  property,  provided 
the  same  permutation  operation  is  applied  to  the  stimuli.  Another  way 
of  looking  at  the  problem  is  to  note  that  with  random  stimuli,  a  sensory  origin 
point  which  is  close  to  a  stimulus  point,  but  does  not  coincide  with  it 
exactly,  ha s  a  probability  of  being  activated  no  greater  than  that  of  any 
other  origin-point.  With  coherent  stimuli,  on  the  other  hand,  an  origin- 
point  which  is  close  to  a  stimulus  point  has  a  greater  probability  of  being 
activated  than  one  which  is  remote  from  the  stimulus  point.  Thus,  for 
random  stimuli,  only  a  transformed  origin  configuration  which  corres¬ 
ponds  exactly  to  the  transformation  7  will  help  in  generalizing  from  _■ 
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to  r(s)  .  For  coherent  stimuli,  it  is  sufficient  that  the  transformed 
origin  points  should  be  in  the  neighborhood  of  the  required  transform; 
proximity  to  the  required  transformation  is  sufficient  to  increase  the 
probability  of  being  activated  by  T(5)  .  . 

Note  that  as  rn  increases,  the  value  of  Q- ■  tends  to  approach 
unity  for  all  curves  in  Fig.  41  .  This  means  that  there  will  be  a  maximum 
similarity  bias  at  some  finite  value  of  no  ,  beyond  which  the  advantage  of 
similar  over  random  stimuli  will  approach  zero.  By  increasing  the  value 
of  for  the  A  units,  the  location  of  the  maximum  bias  can  be  shifted 

further  to  the  right,  until,  with  9  f  n,  ,  the  maximum  will  occur  at 
m  -  (.() 

15.4  Laws  of  Similarity-Generalization  in  Perceptrons 

The  results  obtained  in  the  previous  section  illustrate  a 
number  of  effects  which  are  found  quite  generally  in  perceptrons  which 
show  a  capability  for  similarity-generalization,  regardless  of  whether  this 
capability  is  learned  or  intrinsic,  and  regardless  of  whether  the  perceptron 
is  series -coupled  or  c ros s -coupled.  Additional  evidence  for  these  general 
results  will  be  found  in  subsequent  chapters,  and  they  appear  to  take  on 
the  status  of  empirical  laws,  which  have  now  been  substantiated  for  a 
rather  wide  variety  of  systems.  These  laws  can  be  tentatively  stated  as 
follows : 

The  effects  noted  here  are  directly  analogous  to  those  originally 
predicted  for  c  ros  s -coupled  systems  in  Ref.  85. 
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(1)  As  the  size  of  the  retina  increases,  it  becomes  increasingly 
difficult  to  recognize  the  similarity  of  two  random-pattern  stimuli  under  a 
given  transformation  group,  with  a  finite  perceptron.  With  an  infinite 

retina  (and  transformation  group  of  infinite  order)  the  similarity  bias  for 
random  stimuli  goes  to  zero. 

(2)  The  similarity-bias  for  coherent  stimuli,  under  a 
coherence -preserving  transformation  group,  will  generally  be  stronger 
than  for  random  stimuli,  and  will  not  go  to  zero  even  for  an  infinite  retina 
and  transformation  group  of  infinite  order. 

(3)  The  similarity  bias  of  a  perceptron  can  be  increased 
by  raising  the  threshold  of  its  A-units  or  by  increasing  the  number  of 
connections  to  terminal  A-units  (i.e.  ,  generalization  will  be  limited 
increasingly  to  the  members  of  a  similarity  class,  as  the  threshold  or 
number  of  pre-terminal  units  is  increased). 

(4)  Generalization  to  disjoint  transforms  of  a  stimulus 
may  be  less  than  generalization  to  independent  random  patterns,  for  a 
perceptron  with  weak  similarity  bias;  generalization  to  disjoint  transforms 
can  be  made  to  exceed  generalization  to  random  stimuli,  however,  by  an 
increase  in  A -unit  thresholds  or  by  increasing  the  number  of  inputs  to 

the  terminal  A-units  of  the  network. 


FOUR -LAYER  PERCEPTRONS  WITH  ADAPTIVE  PRETERMINAL 
NETWORI<:S 


1 6 . 

The  physical  universe,  at  a  macroscopic  level,  is  characterized 
by  the  continuity  of  its  transformations  through  time  .  Objects  do  not 
suddenly  appear  out  of  nowhere  ,  persist  for  an  instant,  and  then  vanish 
into  nothingness.  Given  an  appropriate  time-scale,  all  changes  appear  to 
occur  smoothly  and  progressively.  Consequently,  stimuli  which  are  highly 
similar  under  a  continuous  transformation  group  are  more  likely  to  occur  in 
close  temporal  succession  than  dissimilar  stimuli.  In  this  chapter,  it  will 
be  shown  that  an  initially  unbiased  perceptron  can  take  advantage  of  this 
property  of  the  physical  environment  to  evolve  a  capability  for  similarity 
generalization,  without  any  intervention  by  an  experimenter  or  reinforcement 
control  system. 

The  model  which  is  presented  here  was  developed  jointly  by 
Block,  Knight,  and  Rosenblatt,  in  the  hopes  that  its  analysis  would  assist 
in  the  understanding  of  closely  related  problems  which  occur  in  cross - 
coupled  systems.  The  similarity  between  the  performance  of  this  sytem 
and  the  performance  of  c ross -coupled  systems  is  most  striking,  as  will 
be  seen  in  later  chapters.  The  main  effects  of  cross -coupling  will  be  to 
accelerate  the  adaptation  process,  and  to  make  the  system  inherently 
responsive  to  stimulus  sequences,  rather  than  momentary  stimuli.  The 
presentation  in  the  first  parts  of  this  chapter  is  essentially  the  same  as 
that  of  Block,  Knight,  and  Rosenblatt  (Ref.  7). 


16.  1  Description  of  the  Model 


The  perceptron  to  be  analyzed  is  illustrated  in  Fig.  42.  It  is 


a  four -layer  series  coupled  system,  with  an  equal  number  (N^)  of  4 

'21 

units  and  A  units.  Each  A'  unit  receives  a  variable -valued 
connection  from  each  of  the  A  '  units.  In  addition,  each  /I  '  ^  unit 


(!) 


receives  a  fixed-value  connection  from  one  of  the  A 


(I) 


units.  For  conve¬ 


nience,  the  A  and  units  are  placed  in  one-to-one  correspondence, 

'2, 

with  the  fixed  connection  to  each  A  unit  originating  from  its  "mate"  in 

the  A  layer.  The  threshold  of  the  A  units  is  ^  ,  and  the 

■'2)  '2) 

threshold  of  the  4  units  is  ^  .To  simplify  notation,  we  will  use 


the  symbol  G  to  designate 

'/) 


connections  from  A 


to  A 


,  unless  otherwise  indicated.  The  fixed 
units  all  have  values  ^  G  .For 


specificity,  we  assume  that  all  of  these  fixed  values  are  exactly  equal  to  6 


The  variable -valued  connection  from  an 


unit  a-  to  an  A 


unit  a, 


has  a  value  a; 
values  of  A. 


at  time  The  symbol  u;  j  will  be  used  to  designate 


to  A 


connections,  and  i/--^  to  designate  values  of  A 


(2) 


(!) 


to  R-unit  conhections.  The  input  connections  to  the  A\  units  may  be 
organized  according  to  any  of  the  models  (e.g.  ,  binomial  or  Poisson)  which 
were  discussed  in  Part  II.  Signal  transmission  times,  2.'-j  ,  are  assumed 

to  be  equal  to  zero,  for  all  connections.  It  is  assumed  that  stimuli  occur  at 
times  ''  ,  L  *■  A.t  .  '  t  ,  etc  . 


The  numbers  of  units  need  not  be  equal  for  systems  of  this  type  to 
work;  the  constraint  is  introduced  in  order  to  simplify  the  analysis.  It 
is  equally  satisfactory  to  organize  the  perceptron  with  variable 
valued  connections  and  1  fixed  value  connection  to  each  A  unit, 

with  origins  chosen  at  random. 
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Figure  42  AN  ADAPTIVE  FOUR-LAYER  PERCEPTRON  (SOLID  LINES  =  FIXED  VALUE  CONNECTIONS 
BROKEN  LINES  =  VARIABLE  VALUE  CONNECTIONS) 


The  variable  values  u.--  are  assumed  to  be  initially  equal  to  zero, 
and  change  with  time  as  follows:  If  unit  a\!  is  active  at  time  t  and 
is  active  at  t  +  At  ,  then  n receives  an  increment  (f^  ■  At)  ,  and  all 
connections  u-j  decay  by  a  quantity  ( d'  ■  At )  u-j  .  The  values  of  the  A 
to  R-unit  connections  may  be  varied  by  any  one  of  the  usual  reinforcement 

rules.  Note  that  under  these  rules,  the  values  u-j  will  always  be  non- 

fi)  ('/) 

negative,  so  that  if  the  "mate"  of  a  given  /  unit  is  active,  the  A 
unit  will  always  be  active.  In  the  subsequent  analysis,  it  will  be  shown 
that  with  a  suitable  sequential  organization  of  the  environment,  these 
dynamic  rules  can  lead  to  tlie  development  of  a  perceptron  organization 
closely  analogous  to  that  of  the  similarity-constrained  perceptrons  of  the 
previous  chapter . 


16. Z  General  Analysis 

16.2.1  Development  of  the  Steady -Slate  Equation 

As  in  the  last  chapter,  our  main  concern  will  be  to  find  the 
values  of  ;  ,  which  will  permit  further  analysis  to  proceed  along  the 

lines  employed  for  elementary  perceptrons.  Unlike  the  perceptrons  of 
Chapter  15,  however,  the  values  of  .  ■ ,  and  consequently  the  G-matrix 
of  the  perceptron,  are  stochastic  variables,  depending  upon  the  prior 
history  of  the  system. 

I  • 

The  set  of  A -units  in  the  ■'  layer  responding  to  5;  will 
be  denoted  by  A  '  T  ;  the  set  responding  to  both  ^  and  jj  is 
A  (3 j)  •  For  a  perceptron  w ith  a  known  connection 
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scheme  for  the 


1 1 


layer  (or  for  a.  sufficiently  large  perceptron)  the 


fraction  of  4  units  responding  to  both  and  5,  will  be  Q- 


and  is  equal  to  the  number  of  elements  in  H  A 

by  V  .  These  quantities  are  fixed  for  all  time. 


divided 


Let 


*  )  denote  the  total  input  signal  to  the  unit  a 


n ) 


Vi 


at  time  r,  ,  in  response  to  stimulus  .  Then 


A' 


v/her  e 


r  -  / 


'  1  •  f 

1  if  ■  activates  Oe 


(16.  1 


0  othe  rwise 


This  represents  the  sum  of  the  signal  arriving  at  on  its  fixed 
connection,  and  all  of  the  signals  arriving  on  the  variable -valued  connections 
at  time  '  .  Let 


(16.2) 


/V- 


7,. 


(16.3) 


Then 


(16.4) 


The  indices  /’  ,  /  ,  and  k  will  be  used  throughout  this  chapter  to 

designate  various  stimuli,  and  the  indices  /’  and  •!  will  be  used  to 
designate  particular  A-units. 
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'  ( i ) 

Note  that  is  Q  ox  0  depending  on  whether  is  in  A  (S-J 

or  not;  it  is  invariant  with  time.  On  the  other  hand,  represents 

the  effect  of  the  variable  A  to  connections  . 


Now  suppose  that  at  time  stimulus  Sj  occurs,  and  at 
time  stimulus  occurs.  Then  the  consequent  change  in 

will  be 


+  2At)  -  At)  =  (/?-At  (t,^+  Atij-'d-At)  h  At) 

(16.5) 

where  !  0  for  /:  ^  v 

,  I  for  /  -•  -j 


From  (16.3)  and  (16.5)  we  get 


'  r  A  t ^  .  U-  0  ^  At )  \^  O.  f  ( j  ^  ) 

f  I 

Aa 

'  o.).  ^  rf'At)  U  r,i  i'  tgi-At)  ( 5;) 

'■  I 


Hence 


/ 


(16.6) 


where,  for  brevity,  the  subscript  .t  has  been  suppressed.  It  must  be 
remembered  that  j'  and  rv  ,  in  these  equations,  refer  to  any  particular 
A  unit,  c. 
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Now  suppose  the  sequence  of  stimuli  |  ;  5j  ,  •  ■  ■  ,  Sj ^  j 

occurs  at  the  successive  times  t ,  t  +  At  ,  . . .  ,  t  +  MAt  .  In  Equation  (16 . 6) 

-0,  I,  ^  j  ^  J  rn  ’  ^  m  -t-  t  > 


we  take  t q  ~  t  +  rnA  t  , 

and  obtain 

(i) , 


rm 


AP  t  +  K  rn  1 2)  At  j  <  t  t  rn  -h  I )  A  t')  — 


(16.7) 


0  '  At ,  'i\ck.  "  ' \t  i- (  m  i- ! )  At  j  h!  Q--  -AA-Al/f  (t  +  ( rn -h  I )  At) 

1.  _  ®  An 


Summing  on  m  from  0  to  M-  I  we  get  the  change  in  ^  due  to  the 
entire  sequence  of  stimuli: 


/••/  -  / 


p'  '  t  i-  (  M  >■  I )  At )  -  A'  ( t  t  Ai  t]  ■-  V  A  A  t)  (P  I  O'.  ^  f  (rn  -h  !)  S^t) 

rr  =  )  ^ 


(!) 


-  n’At,p'  f  t  +  (rn  ■<  n  Aty  r 


(16.8) 


We  now  divide  by  MA*'  and  let  At  approach  zero  to  obtain 


/ 

<  ■/ 1 


(D  ! 


L 


ni  -  ') 


f  N 

4 

M 


/  f 


J  rn  !  t't  ’ 


/; 


^'•Ay  -  it]  (16.9) 


An  alternative  treatment  is  possible  in  which  difference  equations  are 
carried  throughout,  rather  than  converting  to  a  differential  equation. 
The  true  solution  for  obtained  from  such  an  approach  is  a 

fluctuating  function,  the  local  time-average  of  which  corresponds  to  the 
solution  of  the  differential  equation,  which  is  obtained  here.  As  long  as 
rp  and  fl  are  sufficiently  small,  the  differential  equation,  which  is 
somewhat  easier  to  manage>yi  elds  a  close  approximation  to  the  true 
solution  of  the  finite  difference  equation. 
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Let  l-j_£  be  the  number  of  times  the  stimulus  pair  Sj  S ^  occurs 
in  the  given  sequence  5j  ,  jj  ;  also,  let  be  the 

average  frequency  of  the  pair  Sj  S ^  ■  Then  from  (16,9)  we  get 


~/T~ 


V”'  ^  I 

=  a))Q.^-  -6y 

;  =  /  s  - 1 


(16. 10) 


where  /  •  ,  as  usual,  represents  the  number  of  distinct  stimulus  patterns  in 
the  environment.  Defining  the  matrix  C  ~  Q  ,  with  elements 


/? 


we  have  from  (16.  10) 


(16.11) 


^  ,  (r)}. 

This  gives  us  a  non-linear  system  of  differential  equations  for  {tj,...,  'f  't! 

ID . 

with  initial  conditions  A 


If  the  frequencies  .  vary  with  t  ,  then  the  coefficients 
l  \  ■  are  time -dependent,  but  in  any  case  they  are  non-negative  and 
bounded;  0  is  non -negative ,  monotone  increasing  in  /’  ,  bounded  and 

continuous  on  the  right.  It  will  be  assumed  here  that  the  C ; ;  are 
constants  (corresponding  to  fixed  frequencies ,  / //  ■  ). 
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In  preparation  for  discussing  the  solution  of  (16.  11),  consider 
the  equilibrium  equation 


n 


•J 


(16.12) 


This  corresponds  to  a  solution  of  (16.11)  for  the  steady-state  condition  in 

which  the  rate  of  gain  (represented  by  the  first  term  of  16.  11)  is  exactly 

counterbalanced  by  the  rate  of  decay.  But  the  system  of  equations  (16.  12) 

may  have  more  than  one  solution.  However,  we  shall  show  that  there  is  a 

unique  minimal  solution  (by  which  we  mean  a  solution  none  of  whose  compo- 
*  1 

nents  exceed  the  corresponding  components  of  another  solution);  and 

this  minimal  solution  is  obtained  in  a  finite  number  (at  most  n  )  of  iterations 
of  (16.12),  starting  with  all  ' '  on  the  right-hand  side  of  the 

equation,  finding  the  new  values  of  T'  from  (16.12),  putting  these  back 
into  the  right-hand  side,  and  so  on.  That  is,  we  take  ?-  ~  0  and 


/  / 


'-L 


O')  ..  .'O, 


0  ^  n, 


(16. 13; 


We  shall  prove  first  that  this  process  terminates  in  at  most  n 
iterations.  This  can  be  seen  from  the  following  considerations.  Since 


the  right-hand  side  of  (16.  13)  is  non-negative  and 


,  it  follows 


that 


i  <  1 


J 


.  Now  since  the  right  side  of  (16.  13)  is  a  non¬ 


decreasing  function  of  the  /  's,  it  follows  that  ^  c'f  >■ 

(i> 


^  I 


fn'  ■  Therefore,  also  f  i 

that  is,  successive  0  's  cannot  decrease.  If,  at  a  particular  step,  no  (J) 
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increases,  then  we  are  at  a  solution.  The  0  's  have  only  the  values  zero 
or  1,  so  even  if  only  a  single  (/>  changes  at  each  step,  the  process  terminates 
in  at  most  n  steps  . 


• '  '■I 

The  solution  thus  obtained  will  be  denoted  by  .  We 

shall  now  prove  that  this  solution  is  minimal.  Let  ^  be  any  solution  of 
the  equilibrium  equation  (16.  IZ).  Then  for  the  iteration  process  (16.  13), 
we  have  ^  f  ,  for  all  i  .  Since  the  right-hand  side  of  (16.  13) 

is  a  monotone  function  of  /  ,  we  have 


,:i  /V, 


/V  ^  \ '  /  './  J 


r/j.  A' 

/  '  (}' 


/  ;  -  / 


Similarly,  -<  / 


,  hence  /  ^  ‘  .  Hence  ^  ‘  is 


minimal . 


To  avoid  consideration  of  a  special  pathological  case,  we  now 
make  a  mild  assumption.  Consider  the  sum  ^  ;;  taken  over  a 

i  < 

subset  R  of  the  possible  values  of  ;  !  ^  ^  n  ).  We  assume  that  no 


such  sum  is  equal  to  .  This  is  not  a  serious  assumption,  since  by  a 

cf 


small  change  in 


this  requirement  can  always  be  satisfied. 


Now  suppose  that  the  y'  i.''  satisfy  the  system  of 

•  ) 

differential  equations  (16.  11)  and  the  initial  conditions  '  iJ'  -  ,7 

Then  we  assert  that  the  '  are  non-decreasing  and 

lij  * 

Urn  '/f  Lr  -  T  .  That  is,  the  soJutinn  obtained  by  the  iterative 

C-9 

process  (16.  13)  is  indeed  the  solution  of  the  differential  equation  (16.11), 
with  initial  conditions  zero  in  each  case. 
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First  we  shall  show  that 


dj 


(i) 


J  t 
d(f 

dt 


d  0.  Moreover,  if  >  0 

>  0 . 


,  then 


(ol) 


As  a  preliminary  step,  consider  the  nature  of  the  solution  of 
d  ' 

the  equation  -  -  M  -  rP /  ,  where  A4  and  rf  are  positive  constants, 


and  7'''')  = 


,  where  6  ^  /  < 


M 

T 


The  solution,  Z  =  - 


M 


-(ft 


has  the  appearance  of  the  following  curve: 


-  e 


fM 

'  (f  ■ 


■  0. 


The  solution  approaches  M  '''  monotonely  from  below,  and  ■ ! -<  J  L  • 
for  all  ~  ^  0  .  If  at  time  t  ~  t ^  we  replace  M  by  M j  >  M  the 
solution  appears  as 
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as  t  goes  from  0  to  the  solution  approaches  monotonely  from 

below;  as  t  increases  beyond  the  solution  approaches  Mj  /d  monotonely 

from  below.  The  solution  is  continuous;  so  is  its  derivative,  except  at  t ^ 
where  the  left  and  right  hand  derivatives  are  not  equal,  but  both  are  positive. 

M 

If  instead  of  >  n.  ^  0  ,  we  take  M  =  a  -  0  ,  the  solution 

is  X'tj  -  0  for  0  ^  t  ^  tj 


We  now  proceed  to  the  proof  of  (c/.j  .  Let 


M . 


..i 


->) 


Then  (16.11)  can  be  written 


M  ‘  V 


(16.14) 


where  here  and  in  the  following  paragraph,  t  is  a  generic  index  of  the  set 
(  1 ,  2  ,  while  J  and  ■<  will  refer  to  specific  indices  to  be 

defined  below . 


Each  equation  A’  ‘  can  take  on  at  most  possible 

( A I 

values.  Let  <  be  a  specific  value  of  /  and  suppose  first  that  M  [0)  =  C 
The  only  times  at  which  V'  -  can  change  its  value  are  when  one  of  the 
'  (indeed  one  whose  corresponding  ■  '  )  reaches  the  value  L 

Suppose  the  first  time  at  which  this  happens  is  f ,  ^ 
that  '  'f  .  Since  in  the  interval  -  ! 


.  Suppose  then 

/  n'  ‘ 


all 


./  r 


we  have  A/f"  ' ;  >  A7  ,  Thus  the  solution  appears 


as  in  Figure  (b)  above;  in  particular,  for  all  h  such  that  M'  (O')  > 


we  have  /  i ) 


M 


(T 


-f 


;  and  for  the  others 
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d'  £  - .  Furthermore,  since  both  the  left  and  right 

derivatives  of  are  positive  we  have,  for  t  >  and  sufficiently 

close  to  tf  ,  y  >  0  ,  so  that  it  will  not  be  until  ,  with  ^2  ^  ’ 

that  there  will  again  be  a  T  ''\t)  having  the  value  G  .  In  the  interval 
t j  <  t  <  we  have  the  same  pertinent  conditions  as  we  had  in  the  interval 


d  r 


-  O' ''' ( t: )  ,  with  initial  values 


0  <  t  <  tf  ]  namely,  — - —  -  M  (t,)  -df  '  it)  ,  with  ir 

and  in  particular 


Thus  in 


the  interval  t/  t  <  we  again  have 


'^f'~  -  :?  ,  and  >  0 

u  dt 


The  same  argument  applies  to  successive  intervals  (  t  ^  i )  >  a)  > 

and  so  on.  Since  the  are  monotone  there  are  at  most  n  such 

intervals . 


If  M  C  ,  then  7’  (j  for  C  <  V  <  .  If 


M  (  tf)  >  J  ,  then  we  use  the  previous  argument  starting  at  t  = 

,  '  e 

Otherwise  ^  remains  zero  at  least  until  ,  and  so  on.  In  any 

case,  the  statement  Gy.)  has  been  proven. 

Next  we  shall  show  that 


lirn  d  '  '  (td  ~  Y 


Since,  from  the  proof  of  ((V.)  it  is  clear  that  each  y  ^'(t) 

is  monotone  and  bounded,  Gm  /  d)  exists;  call  it  Y  ; 

it  is  a  sum  of  the  form  r  2—,  C  ■  ■  ,  which  was  assumed  at  the 

0  '■/ 

^  (i)  i4- 

outset  to  be  unequal  to  G  ,  and  thus  Y  4^  0  .  Therefore, 


is  continuous  when  /' 


.  Letting  f  on 
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i'l)  # 

in  equation  (16.11)  we  see  that  f  is  a  solution  of  the  equilibrium 

equation  (16.  IZ).  Hence  ,  since  is  minimal.  We 

next  show  that  for  all  t  ^0  ,  ( t)  ^ 


Note  that  initially  ^(0)  <  .  Suppose  that  t/ 

is  the  first  time  at  which  some  ft)  -  •  From  ( 1 6 .  1 1 )  and 

the  fact  that  0  is  non -decreasing  we  see  that  at  , 


d  6 
L  t 


'  ij  '■/'  I  ^  ^ 


6/' 


=  0 


1 .  e  .  , 


d.  / 


dt 


at  -  t , 


k-"  ^  .  ■'/  r ' 

If  ,  we  have  from  that  —  >  '  at  ,  which  is  a 

^  * 

contradiction.  Suppose  tiiat  -  0  ,  so  that  also  =  0  .  Then, 

, ji)* 

as  long  as  no  /  ( f  1  reaches  a  non-zero  /'  ,  we  have 


(A), 

M  di 


n 


vL-:,  t  (d"'*  d’ui)  ■i 

,  -  f  J  -  I 


n 
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can 


Hence  over  this  period 

ever  be  attained  by  /' 

would  have  — 0 
dt 

to  (rv 


d  (t)  =  0  .  But  no  non-zero  / 

(' ' 

“(t)  ,  since,  by  the  above  argument,  we 

at  the  first  time  this  occurs,  in  contradiction 


Hence  if  //'  ->  f  ,  then  [t, 

dt},  .(''V 

,  then  ^  (t)  ~  d  ■  hi  general, 

tj^  'd  it'd 

Hence  =  li'n  y  U/  ^ 


a  "■ 

d  ;  and  if 


r 


(id 


,  and  (d)  follows. 


From  this  point  on,  we  shall  be  concerned  with  the  steady-state 

di-d  , 

values  ^  ,  and  for  brevity  we  shall  drop  the  .  In  the  terminal 

condition,  the  A-unit  ,  whose  history  we  have  been  following  up 

2j 

to  this  point,  is  activated  by  '  ■  if  ,  .  Ir  '}  The  set  of  /' 

units  which  are  activated  by  stimulus  are  denoted  by  /  ^  ,  .  -In 

the  initial  state,  the  set  -•  (  ;)  is  denoted  by  i  5;,.'  >  and  in  the 

terminal  state  by  ••  ,  '  ~  d  The  expected  fraction  of  A'"  units  which 

are  activated  by  both  ,  and  j  •  will  be  d/j  and  is  equal  to  the  expected 

/pj,  .  ,■}  . 

number  of  units  in  A'  •/  /)  A  (  •  '  divided  by  /V'^ 

Once  the  y  are  known,  the  behavior  of  the  perceptron  in  its 
terminal  (steady  state)  condition  can  be  predicted.  To  determine  these 
terminal  values  of  0;;  ,  we  can  proceed  as  follows.  First,  the  set  of 

/i  units  is  broken  into  the  smallest  possible  cells  of  the  Venn  diagram 
which  represents  the  sets  of  units  responding  to  different  stimuli  (c.f., 

Fig.  43).  For  the  units  in  each  of  these  cells,  there  is  a  characteristic 
-vector.  For  each  such  -vector,  we  solve  equation  (16.  12)  for 
the  terminal  values  of  A'  '  ■  Here  we  assume  /’^  .  to  be  given,  and 
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can  be  obtained  from  previous  equations  (as  in  Chapter  6).  Initially, 

Q;j  =  Q-j  .  Knowing  and  ,  we  can  determine  the 

region  of  the  -4  Venn  diagram  to  which  each  cell  of  units  moves. 

Thus  we  obtain  the  complete  terminal  distribution  of  A -units  in  the  Venn 

(2)  (2) 

diagram  of  A  ,  and  hence  in  particular  the  Q-j  .  It  can  be  seen 

that  the  motion  will  be  for  A -units  to  tend  to  go  into  higher -order  intersections, 

(2) 

but  that  points  which  are  initially  outside  all  the  A  (5;)  will  stay  outside 

(2  ' 

all  the  A  • 

16.2.  2  A  Numerical  E.xample 


To  clarify  the  above  description,  an  illustrative  example  is 
worked  out  here  numerically.  Suppose  there  are  three  stimuli,  S,  ,  5^  , 
and  ,  which  initially  activate  sets  of  A^  units  (or  sets  of  A^  ^  units, 

which  will  be  equivalent  under  starting  conditions)  shown  in  the  Venn  diagram 

//  '’21 

of  Figure  43(a).  Here  the  'y;j  matrix,  and  the  initial  value  of  the  Q/j 

matri.x  is 


Suppose  the  sequence 

2  'I  -'2  -  -  - 

during  the  training,  or 

matrix  is 


I  I 

"preconditioning" 


,  from  the  above  analysis,  is 

m 

This  is  repeated  over  and  over 
of  the  perceptron.  Then  the  i'-j 
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I  0  -3  ■'\ 

.2  .1  ./ 

.  ?  0  0 


Consequently,  we  have  the  matrix 


'‘j  too 


I  ■■ 
I  ■ 


c  : 


r4 

.10 

.04 

14 

.07 

/■'  ^ 

Id 

. "  6 

..4 

The  equilibrium  equations  (16.  12)  then  become 


/  \ 


t  I 

I .  •• 


■I  ! 


^  l.'J 

■'I  .7 


\ 


■  7\ 

I  : 


(16.15) 


Now  we  begin  to  trace  the  destinations  of  cells  of  the  Venn  diagram  of 
Fig.  43(a).  Start  with  the  two  A-units  which  are  activated  only  by 
Here  V  r  ■  ' .  .  ,  '/  .  The  first  iteration  of  (16.  15)  then  gives 


-7 


\ 

}  ' 

1  1.4 

I 

/ 

If  '/?  0  /. '^  ,  then  these  '/f  r  are  zero,  and  the  points  in  question 

stay  in  the  same  resion  of  the  Venn  diagram.  To  be  specific , 
let  us  take  /  d  -  I  .  Tlien  we  get  for  the  first  approximation 


.  I  . 

\ 


I  r 
\ 


!  ..•/ 
\/.5/ 
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and  for  the  second  iteration, 


which  is  the  fixed  point.  The  two  associators  in  question  have  consequently- 
moved  into  the  triple  intersection  of  the  Venn  diagram  in  Fig.  43. 


Continuing  in  this  fashion  with  each  of  the  eight  cells  of  the 

Venn  diagram,  we  finally  arrive  at  the  terminal  distribution  shown  in 

Fig.  43(b).  For  this  we  have  the  terminal  Q-matrix: 

!  \ 

I  ■  n  ,  I)  \ 

\  .n  .V  ! 

The  stimuli  and  i.,  have  become  indistinguishable.  The  G-matrix  for 
an  ry  -system  is  the  same  as  ,  while  for  a  'f  -system,  it  would  be 


4 

./ 


.  .  6 


.  -y 


The  "coagulation"  of  and  , corresponds  to  the  fact  that  in  the  training 
sequence  (which  is  reflected  in  the  ■  matrix)  Iq  and  S-  follow  one 
another  quite  frequently,  whereas  they  are  very  rarely  followed  by  5-. 
Consequently,  tends  to  remain  distinct,  in  the  terminal  G-matrix. 

In  the  following  section,  it  will  be  seen  that  such  behavior  is  quite  character- 

5jC 

istic  of  this  system. 


Another  numerical  example  will  be  found  in  Section  17.  Z,  where  the 
four-layer  system  is  compared  with  an  open-loop  cross -coupled  model. 
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16,3  Organization  of  Dichotomies 


The  general  analysis  of  the  preceding  section  can  be  applied 
to  a  large  variety  of  particular  experimental  designs.  To  begin  with,  we 
will  show  that  with  a  suitable  choice  of  parameters  for  the  perceptron,  and 
a  suitable  sequence  of  stimuli,  a  perceptron  can  spontaneously  dichotomize 
an  environment  into  any  two  classes,  without  any  control  of  the  reinforcement 
process  by  an  external  agency  or  experimenter.  The  organization  of  the 

>!<' 

stimulus  sequence  will  determine  the  particular  dichotomy  which  is  formed. 


Let  the  sequence  of  stimuli  to  which  the  perceptron  is  exposed 
be  i?;  .  .In  the  following  discussion,  such  a  sequence 

•-'t  "''I 

will  be  called  a  "preconditioning  sequence".  Let  P;  denote  the  fraction 
of  occurrences  of  •  m  the  given  sequence,  and  let  Pj ^  denote  the 

number  of  times  immediately  follows  •  divided  by  the  number  of 

M  r  / 

times  ,  ;  occurs.  Then  --f  -  •  ■  With  a  sufficiently  long 

sequence,  '  '■  /  /  ,  and  the  equilibrium  equation  takes  the  form; 


r  r 


(16. 16) 


where  P-  corresponds  to  the  probability  of  j;,  and  corresponds  to 

) 

the  transition  probability  ‘  r  '  .  .•  -  Ir.'i,.  ■  . 


This  can  be  interpreted  as  an  R-controlled  reinforcement  system, 
although  it  does  not  actually  depend  on  the  outputs  of  the  R-units  in 
any  essential  way. 
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EXPERIMENT  10:  Take  an  environment, 


consisting  of  n  stimuli, 


such  that  there  is  no  appreciable  difference  in  the  retinal 
overlap  of  different  pairs  of  stimuli.  (With  a  large  retina,  a 
set  a  random  dot  stimuli  will  generally  satisfy  this  condition.  ) 
Divide  the  stimuli  arbitrarily  into  two  classes,  so  that 
S,,  are  in  Class  X  ,  while  5,.  5  are  in 

Class  \  .  All  members  of  a  given  class  are  equally  likely  to 

occur.  Let  the  probability  of  transition  to  a  member  of  the 
same  class  be  p  ,  nearly  unity,  and  to  a  member  of  the 
opposite  class  be  /  /'  ,  nearly  zero.  Let  the  perceptron  be 
exposed  to  an  extended  preconditioning  sequence  composed 
according  to  these  probabilities,  without  any  control  by  the 
r.c.s.  At  the  end  of  the  preconditioning  sequence,  the  perceptron 
is  exposed  to  a  short  additional  sequence  composed  in  the  same 
manner,  during  which  R-controlled  reinforcement  is  administered, 
according  to  the  rules  of  the  '  -system,  for  A-unit  to  R-unit 
connections.  The  values  of  all  connections  are  then  "frozen”, 
and  the  response  of  the  perceptron  to  each  stimulus  in  IV 
is  ascertained . 

It  can  be  seen  that  this  experiment  is  closely  analogous  to 
Experiment  9,  in  which  the  effects  of  R-controlled  reinforcement  were 
determined  for  an  environment  of  horizontal  and  vertical  bars,  except  for 
the  preconditioning  sequence  (which  would  have  no  effect  at  all  in  a  simple 
perceptron),  and  the  additional  condition  that  there  is  no  way  of  determining 
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whether  two  stimuli  belong  in  the  same  or  opposite  classes  on  the  basis  of 
their  retinal  overlap.  The  only  thing  which  characterizes  two  members  of  the 
same  class  differently  from  stimuli  of  opposite  classes  is  the  difference  in 
transition  probabilities  in  the  preconditioning  sequence. 


We  assume  u  • 

^  .y 


’  V  - 

anc 


A/q_  and  all 


■j  -  ■  •  •  /  /  i  -  ;  where 

•  / 

Thus  the  diagonal  elements  of  the  Q-  matrix  are  all  -h  A 
other  elements  are  ^  / Nq^  .  (Note  that  by  raising  thresholds  of  the  A 
units,  with  a  sufficient  number  cf  connections,  the  ratio  ^3-^  be  made 

as  small  as  desired.)  For  the  probabilities  of  stimulu  s -occurrence  indicated 
in  the  experiment,  we  have 


//  J_ 


for 

for 


in  ' 
in  V 


where  /. 


r-  -  k 


r 


for 

for 

for 

for 


in  ’  ,  in  ‘ 

in  in 

in  /,  .)£  in  Y 


in 


in 


Then  we  obtain  from  (16,  16), 
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Let  us  now  assume  that  5r  of  the  stimuli  of  class  / 


Then 


! 


We  now  observe  the  following: 


i)  H 


f/yAp-tjK)  , 

- ^ - ^  ,  ) 

?rfK^ 


then  f5^)  ^  li  (S-.) 


'C 


In  words,  if  the  stated  inequality  holds  then,  in  the  terminal 
condition,  each  of  the  stimuli  of  class  /  activates  the  union  of  all  sets 
which  were  initially  activated  by  any  of  the  stimuli  of  class  /  .  That  is, 
each  stimulus  of  a  given  class  has  "captured"  all  of  the  A-units  that  initially 
responded  to  all  of  the  other  stimuli  of  that  class.  The  proof  follows  from 
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the  fact  that  any  A  unit  which  originally  responded  to  any  of  the 

stimuli  in  class  /  contributes  a  non-zero  term  in  21  iri  (16.18). 

f>=l 

The  postulated  inequality  then  guarantees  that  the  A -unit  will  be  active  in 
the  terminal  state. 


ii)  If 


Y}\_a({  ~  p)  f  9  /c"  ] 
2Kd 


<  e  ,  then  A^J'(5y) 


Cl 


U 

5  ■  cx 

J 


In  words,  if  the  stated  inequality  holds  then,  in  the  terminal 

condition,  no  stimulus  of  class  X  activates  any  A-unit  outside  of  the  union 

of  sets  initially  activated  by  stimuli  of  class  /  .  The  proof  follows  from 

the  fact  that,  if  we  were  to  solve  (16.  18)  by  iteration,  then  any  A-unit 

which  is  activated  by  none  of  the  X -stimuli  has,  on  the  first  iteration,  no 

K 

contribution  from  21  •  In  virtue  of  the  assumed  inequality  it  will  not 

/ 

have  any  contribution  on  any  following  iteration  either,  and  c6  remains  less 
than  Aj  .  Since  only  a  finite  number  of  iterations  are  involved,  this  unit 
does  not  becom.e  active. 


( 2) 

iii)  If  the  inequalities  of  (i)  and  (ii)  both  hold,  then  A^  ('^/) 


U 

5-  f.X 
J 


Necessary  and  sufficient  conditio  s  for  both  (i)  and  (ii)  to 
hold  have  been  found  by  H.  D.  Block.  They  ar  ’ 

a)  x:  .• 

b)  p  ^  Ka  + 

c)  K  '  /  {A  p  +  )  A  Ky^Ad  -  p)  i- rj_K ) 
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Condition  a)  insures  that  a  probability  p(0<  p<  l)  can  be  chosen  to 
satisfy  b).  Condition  b)  insures  that  p/^9d  can  be  chosen  to  satisfy  c). 
The  conditions  can  be  written  in  the  alternative  form 

a' )  D  >  K  /'' K  I-  t) 

I  / 

b')  IK  I)  - 

c  )  as  above . 

Under  the  conditions  indicated,  if  Experiment  10  is  completed 
by  exposing  the  perceptron  to  a  continuation  of  the  same  stimulus  sequence 
with  R-controlled  -reinforcement,  the  first  response  to  occur  will 
immediately  generalize  to  all  stimuli  of  the  same  class  as  the  one  which 
evoked  the  response,  since  each  member  of  the  class  activates  the  identical 
set  of  A-units,  after  the  preconditioning  sequence.  Suppose  a  member  of 
class  /  is  the  first  stimulus  to  occur,  and  that  this  happens  to  evoke  the 
response  f' ‘  ■  >  I  .  Then  this  response  will  be  reinforced,  and  will 
generalize  immediately  to  all  other  members  of  class  X  .  However, 
under  the  conditions  assumed  above,  the  intersections  between  the  sets  of 
A-units  initially  responding  to  stimuli  of  class  /  and  stimuli  of  class  Y 
were  all  equal  to  v  ,  and  it  was  noted  that  by  using  large  thresholds, 
could  be  made  arbitrarily  small  relative  to  the  measure  of  the  responding 
A-sets.  If  each  A-unit  has  a  large  number  of  distinct  origin  points  (no  two 
identical  )  o  can,  in  fact,  be  made  small  relative  to  the  product  Q-  Q- 
Thus,  with  a  large  threshold,  in  a  /'  -system,  the  generalization  coefficient 
Cj-^j  for  j;  in  X  and  in  Y  will  be  negative.  Consequently,  any 
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stimuli  of  class  /  will  automatically  be  assigned  the  opposite  response 
from  stimuli  of  class  X  .  Thus  a  completely  consistent  dichotomy  has  been 
created,  from  the  time  the  first  stimulus  of  the  terminal  training  sequence 
occurs.  Further  reinforcement  will  only  strengthen  the  tendencies  thus 
established . 


If  the  ratio  cf  is  made  large  enough,  the  perceptron  in 
Experiment  10  will  ultimately  arrive  at  a  state  in  which  every  stimulus 
activates  all  A-units  which  ever  responded  to  any  stimulus  of  either  class. 
However,  in  practice,  the  constraints  on  the  parameters  need  not  be  as 
severe  as  those  indicated  in  conditions  a),  b),  and  c)  above,  in  order  to 
obtain  useful  generalization  effects  from  the  system.  As  long  as  ^ / (X 
is  not  so  large  as  to  cause  a  complete  merging  of  all  A-sets  for  all  stimuli, 
it  remains  possible  to  teach  the  "preconditioned"  perceptron  to  discriminate 


all  stimuli  of  the  two  classes  correctly  with  ■  single  corrective  reinforcement 

for  one  stimulus  of  each  class,  as  long  as  the  inequality  --  ^  t  ■  >  9 

2  (f  K 


is  satisfied. 


16.4  Organization  of  Multiple  Classes 


Suppose  we  have  the  same  kind  of  environment  as  in 
Experiment  10,  but  that  the  stimuli  arc  considered  to  be  of,  say,  three 


classes : 


(  K  -h  L  -h  M  =  n  J 


We  assume  there  is  not  too  much  overlap  between  the  different  types  of 
stimuli,  an  assumption  which  will  be  made  more  precise  below,  (as  in 
the  previous  case,  the  overlap  can  always  be  reduced  as  far  as  required 
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by  making  6  sufficiently  high.)  The  three  classes  will  be  called  X,  Y, 
and  Z  ,  We  assume  that  the  Q—  ^  matrix  is 


if  5-  and  $■  are  in  different  classes 


Oij  -  \{q< 


(q  I  ri  /  y  if  j-  and  5;  are  in  the  same  class,  5;  Y'  5; 

'  /  <■  J  '■  J 

(q+rfA)/N;^^  if  -  5 j  . 


From  the  nature  of  a  Q-  matrix  it  is  necessary  that  ^  ^  0,  ( cp  +  r)  ^  0, 
and  [r-hA''^0  .  We  assume  yJ  ^  0 


Suppose  that  the  transition  probabilities  are  large  {v)  for 
transitions  to  a  member  of  the  same  class,  and  small  {  \  -  p)  to  each 
of  the  other  classes.  Within  a  class  each  transition  is  equally  likely.  Then 


r. .  = 


p  ' 

•  in 

/. 

i'  ■  in 

■■ 

) 

' 

/'  “ 

i  in 

• 

in 

/ 

P'M 

in 

-  • 

’  i'"* 

-f 

'/  p)  YL 

■>;  in 

/. 

in 

Y; 

or 

'i  in 

5;  in 

1 

-  /  i” 

A, 

I',-  in 

7  ■ 

t 

or 

5'  in 

i 

Y, 

s-  in 
''J 

(1  r',;"' 

in 

-'"i  in 

/' : 

or 

b,'  in 

7 

^  } 

S;  in 

u 
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The  probabilities  of  occurrence  of  individual  stimuli  are  given  by 


1/3  K 

9- 

'j 

in  Z 

1/3L 

c , 

-"J 

in  / 

1  /jM 

5- 

j 

in  2 

Then  Equation  (16.  16)  becomes 


T‘ 


V 


if 


LL^L  L-L  Z'  Z  Z^Z  Z 

Acx  jgx  jiX  kiZ  JcY  4eX  JcY  k.eY 


(16.19) 


r 


ZZ'ZZ^ZZ'ZZ 

/£/  ^6Z  JiZ  &iX  JfZ  AiY  JeZ  S€Zj  L 


where,  for  simplicity  of  notation,  /  ,  ,  and  Z  have  been  used  for  the 

appropriate  index  sets.  Suppose  X  is  in  /  (i.e.,  $ is  iri  X  )■  Then 

(16. 19)  yields 


~  •  V  ,  4  ...  (41 


(16.20) 


I  4^  • 


2: 


I  Y  AeZ 


(4/ 


^  0 


We  can  now  assert  the  following: 


i)  H 


f/  \ jp(Kr  -h  A j  + 

- — ^ - - - >  0  .  then  the  set 

3  f  K  ^  ^  ^  X- 


u 

Sj€X  °  ^ 
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That  is,  if  the  stated  inequality  holds,  then  every  A  unit  which  initially 
responded  to  any  stimulus  in  class  X  now  responds  to  each  stimulus  in 
class  X  .  This  is  readily  proven  by  noting  that  for  any  A  unit  which 
initially  responds  to  any  of  the  stimuli  in  class  X  there  is  at  least  one 
non-zero  C  in  ZL  in  (16.20).  The  postulated  inequality  then  guarantees 


in 


that  /  9  for  any  /  such  that  is  in  X  . 


ii)  If 


V-? 


'  i-  P)(  i 


bKff 


<  9  ,  then  —  U  a/^(S;) 


S-eX 

j 


That  is,  if  the  stated  inequality  holds,  then  every  A  unit  which  did  not 
initially  respond  to  at  least  one  of  the  stimuli  of  class  /  does  not  respond 
to  any  class  X  stimulus  in  the  terminal  state.  This  is  proven  as  follows. 
For  an  A-unit  which  does  not  respond  to  any  stimulus  of  class  X  ,  none  of 
the  terms  in  ZL  (1^-  18)  are  present  on  the  first  iteration,  which 

starts  with  -  (j  .  The  stated  inequality  guarantees  that,  even  if  all  the 

other  terms  are  present,  no  for  '  ■  will  reach  9  .  Thus  no  terms 

i 

in  Zj  will  ever  be  non-zero. 
itf  ^ 


iii)  If  both  of  the  above  inequalities  hold,  then  A  ^  3  ^  fS:) 

'■  -rX 
-  .1 


That  is,  each  stimulus  in  class  X  activates  exactly  the  same  set  of  A 
units  in  the  terminal  state;  and  that  set  consists  of  just  those  A-units  which 
originally  were  activated  by  any  one  of  the  stimuli  of  class  A" 
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Necessary  and  sufficient  conditions  that  the  inequalities  of  both 


i)  and  ii)  be  satisfied  have  again  been  derived  by  Block,  and  are 

a)  r  >  /-. 

b)  <  (kr  >  il / ■'’ 

c  )  p  >  \  ,  (-j  K  [k  -  \ }  k  ^  -h  ^  k  r  +  k.){Kk'/> } 

d)  '  '  f_  k  r  ^  \  ^  k  1/ /•■■(?  <  6A  /  k'rf/Jf  7^^/c'] 

Condition  a)  guarantees  that  a  suitable  y  ^  i)  can  be  chosen 
in  b);  Condition  b)  guarantees  that  a  suitable  f)  k  I  can  be  chosen  in  c); 
Condition  c)  guarantees  that  an  /'  <!'  can  be  chosen  to  satisfy  d). 

If  the  parameters  are  suitably  set  we  have  seen  that  the  response 
in  the  -  layer  to  any  stimulus  in  class  ^  is  U  A'  '  ‘  •  .  Similarly 

for  classes  /  and  '  This  means  that  a  '7  -system  perceptron  with  a 
single  R-unit  will  tend  to  assign  the  same  response  to  all  members  of  the 
first  class  of  stimuli  to  be  represented  in  the  training  sequence.  All  other 
stimuli  will  receive  tlie  opposite  response,  if  the  initial  intersections  of 
responding  A-sets  are  small  enough.  With  more  than  one  R-unit  and  inhi¬ 
bitory  connections  between  the  R-unils,  so  that  only  one  can  go  on  at  one 
time  (c.f.,  Chapter  ZO)  it  is  thus  possible  for  the  perceptron  to  assign  a 
unique  response  to  each  stimulus  class.  If  there  is  too  much  initial  overlap 
between  the  responding  sets  of  A-units,  or  if  condition  i)  is  satisfied 
without  condition  ii)  being  satisfied,  a  single  corrective  reinforcement  applied 
for  any  one  stimulus  of  each  class  may  still  be  sufficient  to  yield  the  correct 
response  for  all  stimuli  in  the  environment. 
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16.5  Similarity  Generalization 


In  the  experiments  considered  above,  the  nature  of  the  stimulus 
classes  was  never  explicitly  stated.  Clearly,  they  could  have  been 
similarity  classes,  under  a  suitably  chosen  similarity  relation,  and  the  same 
results  would  have  been  obtained.  In  order  to  obtain  generalization  over  the 
entire  class,  however,  it  was  assumed  that  "runs"  of  stimuli  from  each  class 
occurred,  it  being  much  more  likely  that  a  stimulus  was  followed  by  another 
member  of  the  same  class  than  by  a  stimulus  from  a  different  class.  After  a 
long  preconditioning  sequence  of  this  type,  it  might  be  expected  that  the 
perceptron  would  have  seen  each  stimulus  in  the  environment  a  great  number 
of  times.  We  now  consider  the  generalization  of  a  similarity  relation  to 
stimuli  which  have  not  occurred  during  the  preconditioning  sequence. 


EXPERIMENT  11:  Consider  an  environment  of  stimuli  ^  ,  . . . , 

and  their  transforms  ^ ' 'J , )  ,  T T •  ^  )  where  T  is 

any  transformation  in  which  the  m.easure  of  fixed  points  is  zero. 
Let  the  perceptron  be  exposed  to  a  preconditioning  sequence, 
consisting  of  stimuli  followed  by  their  transform.s,  i.e.  ,  a 
sequence  of  the  form  ] 

where  the  subscripts  are  picked  at  random 

from  the  set  of  integers  1  through  n  .  Now  consider  a  pair  of 
test  stimuli,  "  and  ,  and  their  transforms  /c) 

r  j  j  ,  none  of  which  occured  during  the  preconditioning 

7 

sequence.  Let  one  response  be  associated  to  and  the 

opposite  response  to  ,  by  means  of  an  error  correction 

procedure.  Now  test  the  perceptron  to  determine  its  response 
to  T  (  and  F  (  '">  .)  . 

A  <j 

='''  This  is  directly  analogous  to  the  phenomenon  of  similarity  generali¬ 
zation  originally  predicted  for  cross -coupled  systems  in  Rosenblatt, 
Ref.  85. 
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It  is  predicted  that  if  this  experiment  is  performed  with  random 
dot  stimuli  in  the  preconditioning  sequence,  with  a  finite  retina,  and  5^ 
and  are  any  other  stimuli  (e.g.,  a  square  and  a  triangle,  or  two  letters 

of  the  alphabet)  the  transforms  '  .  ;  and  T(S^^)  will  each  tend  to  activate 

the  appropriate  response,  which  was  associated  to  and  S,,  ,  respectively. 

In  other  words,  the  perceptron  will  have  learned  that  any  two  stimuli  which 
are  similar  under  the  transformation  T  are  to  be  treated  as  equivalent, 
even  though  the  stimuli  have  never  been  seen  before. 

To  begin  with,  v/e  consider  the  following  problem,  which  is 
essentially  a  special  case  of  Experiment  11,  performed  with  only  a  single 
test  stimulus  , 

Consider  the  stimuli  b.  and  their  transforms 

I  ,  -  T T':^)  .  For  example, 

>1  .  ..  .  .  ^  ^  may  be  in  the  left  half  of  the  field,  and  T  a  transformation 
which  moves  them  to  the  right  half  of  the  field.  Sy  ~  0  is  not 

shown  during  the  preconditioning  sequence,  but  is  a  test  stimulus  to  be 
applied  later .  ^  ,  x'  ?K  -r  ^  n  .  Let  us  assume  S 

intersects  ,  .  .  .  ,  ^  i  .  to  a  larger  extent  than  it  does  the  others 

and  hence  ,  '  intersects  mainly  the  stimuli  ^ ^  t  •  •  •  •  >  5^  ^  ^  .  These 
relationships  are  illustrated  in  Figure  44. 
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Figure  ^4  RELATIONSHIP  OF  TEST  STIMULUS  TO  PRECONDITIONING  STIMULI 
AND  TRANSFORMS 


Specifically,  consider  the  conditions 


i  (q  -h  A  ify  ;)  N  ;  >  ' 

i  ^  rj/.'y,  j  ^  L 


J 

K  +  i  <  J  ^  Ki-L 
J  ^  X^L 


In  the  preconditioning  sequence,  a  stimulus  S-  is  picked  at 

random  from  j  ^  ,  .  .  .  ,  ^  ,  and  this  is  followed  by  its  transform,  T  ( 

The,!  another  stimulus  is  picked  at  random  from  S,  ,  •  •  5^.  and  this  is 

I  K 

followed  by  its  transform,  and  so  on.  Then 
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f  I  '  2K 
\ 

0  I 

'  - 

’I  ,J  2.  K  -  12  -h  J 

Pj2  -  j  /  '  ^  j  >  K 

y  0  ('*.h‘'T  A/l  'jf^ 

We  also  specify  that  no  A -unit  is  activated  by  more  than  /z  (" /z  <  K^L)  of 
the  stimuli  5,  ,  ,  ,  •  •  •  .  ft- 


J  ^ 

J  > 


From  Equation  (16.16)  we  obtain 


,  •  i"  "  "  ^  .F 


2K  y 

;  2.  Z.  ^y.j  "V"  )\  (16.21) 

1  i 


Hence  we  have  the  following  results: 
i)  If  /  j  ^  rj  .  y  •'  ■  ■-  ,  then  A 


(16.22) 


In  words,  if  the  stated  inequality  holds,  then,  in  the  terminal  state,  ^ 
activates  all  those  elements  originally  activated  either  by  itself  or  by  any 
of  the  transforms  T  <  >  . T  ^  i 

ii)  If  2^ ,  then  A  A  (S  ^)  ->  U 

iii)  If  both  inequalities  hold,  then  A^^  ^)  ' 

J  A  L 


378- 


Thus  far,  we  have  considered  the  generalization  of  a  response 
from  5^.  to  the  transforms  TiSf)  ,  TiS^)  t  etc.  Suppose  a  response 
is  cussociated  to  T  (Sy)  ;  we  are  then  interested  in  determining  whether 

there  is  any  generalization  in  the  reverse  direction,  i.e.,  to  T  (2^') 

(x‘)  I 

We  can  obtain  'f  from  Equation  (16.21),  with  x  replaced  by  X 

which  yields : 

k' 

_ y 


Consequently, 


iv)  If 


L  r  u- 


G  ,  then 


CO 


G  [5 - 


If  inequalities  i),  ii),  and  iv)  all  hold,  then  the  stimulus  Sy  generalizes  to 
...  )  ,  but  the  transform  T  { S  ■/ )  =  Sy'  does  not  generalize 

to  the  stimulus  , .  Necessary  and  sufficient  conditions  that  all  three  ' 
inequalities  hold  are  easily  found;  (With  r  ^  ')  ,  then  iv)  implies  ii)  ). 

.  I  ‘  k  i  '  i  J 

a  /■’  >  /  i - L. 

-  -L// 

b)  ^  ^  _ 

Lj_  ^  ^  d  K  ( K  t  Jji )  <j  -h  L  f'  JM 

In  particular,  let  L  --  I  .  Then  (Sy)  -  Aq  (j/)  A^  (r(S/))  . 

Thus,  due  to  the  intersection  between  and  , 

the  test  stimulus  generalizes  to  its  transform,  even  though  neither  the  test 
stimulus  nor  its  transform  has  occurred  during  the  preconditioning  sequence. 
Under  these  conditions,  the  perceptron  will  behave  in  much  the  same  manner 
as  the  specially  constrained  similarity-biased  perceptron  of  Chapter  15.  The 
actual  magnitude  of  the  bias  thus  induced,  in  a  simple  discrimination  experi¬ 
ment,  can  be  calculated  as  follows. 
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but  its  chief 


Let  5o  be  another  test  stimulus,  like  S-y  , 
intersection  is  with  5^  ,  say  also  -h  r  .  Then  if  conditions  a)  and  b)  are 


satisfied,  (with  L  =  I  )  , 


(2) 


(SJ  =  A'f(SJ  a'„"'(T(5J) 


(2) 


'j/  '’0  V  '  "0  V 

Suppose  the  perceptron  has  zero 


and 


to  R-unit  connections.  Let  5;^-  be  shown,  and  all 


initial  values  on  the  A 
active  A-R  connections  reinforced  by  /  .  Then  let  5^  be  shown,  and  all 
active  A-R  connections  reinforced  by  ■-  /  .  Now  if  the  perceptron  is  shown 

T  (S^)  (which  it  has  never  seen  before)  the  input  to  the  R-unit  is  equal  to 
the  number  of  A -units  in  (5^))  {t  (Sf))  UA^J^(S^)~\ 

minus  the  number  of  A-units  in  (t  ( S^))  H  [T  (S  2))  U  , 

which  in  general  is  positive;  while  if  it  is  shown  T  (  Sj)  the  signal  to  the 
R-unit  is  negative.  Thus  the  discrimination  which  was  taught  for  S.^ 

5,  carries  over  to  T  Sy'  and  T  S -^ )  . 

O  ^ 


In  the  above  analysis,  it  was  postulated  that  the  test  stimuli 
should  have  larger  intersections  with  some  of  the  preconditioning  stimuli 
than  with  others.  This  assumption  is  crucial  for  the  predicted  effect  to 
occur.  The  reader  will  recall  from  the  discussion  of  the  last  chapter,  that 
in  a  perceptron  with  an  infinite  retina,  no  similarity  bias  could  be  obtained 
between  random  stimuli  because  the  distribution  of  their  intersections  had 
zero  variance.  The  same  situation  holds  here.  If  the  preconditioning  stimuli 
are  random  dot  patterns,  and  the  retina  is  infinite,  then  every  preconditioning 
stimulus  will  have  exactly  the  same  intersection  with  the  test  stimulus  5^.  » 
and  the  required  bias  cannot  occur.  In  a  finite  retina,  however,  the  inter¬ 
sections  will  be  binomially  distributed  (as  in  the  analysis  of  Chapter  15),  and 
the  predicted  effect  will  be  obtained. 


-380- 


We  also  note  an  advantage,  as  before,  if  compact,  coherent 
stimuli  are  employed  for  preconditioning  and  as  test  stimuli.  In  this  case, 
even  in  an  infinite  retina,  the  distribution,  of  intersections  will  have  non-zero 
variance,  and  the  test  stimulus  will  tend  to  be  more  closely  related  to  some 
preconditioning  stimuli  than  to  others.  As  long  as  two  test  stimuli,  Sy,  and 
5^  ,  do  not  intersect  the  same  sets  of  preconditioning  stimuli  to  the  same 

degree,  they  can  be  discriminated  in  the  terminal  state  of  the  system  (provided 
the  required  parametric  conditions  are  satisfied),  but  each  will  generalize  to 
its  transform.  Thus  the  claim  made  for  the  performance  of  such  a  system  in 
Experiment  11  has  been  verified  in  principle.  Quantitative  studies  of  actual 
cases  are  not  yet  complete,  but  similar  experiments  with  cross -coupled 
systems  (to  be  presented  in  Chapter  19)  suggest  that  highly  satisfactory  results 
can,  in  fact,  be  obtained  in  practice. 

The  asymmetrica.1  generalization  from  to  T  f  i, )  ,  but  not 
from  ,  to  ^  can,  of  course,  be  overcome  by  employing  a  symmetrical 
preconditioning  sequence,  in  which  a  stimulus  is  as  likely  to  be  followed  by 


-  / 


the 

inverse  transformation,  f  f^) 

as  by 

) 

For  instance,  take  •  jf,--- 

’  'V,  ! 

r  ^  ^ 

'  -n  f 

J 

1  r- 

1 

r  •  r  c  '  r  ' 

,  .  .  .  ,  Ji  ^  \  ^  J  ,  \  ^  fy  J  j 

where 

K 

n  /  .  Let 

(I'l 

<’:■  ■  '  '1  ■  </•  •  V 

•,i  >  n,  /  ■ 

P;  ■  /  kA 
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( 

■p 

J 

p. 

-  K-hJ 

p 

J 

> 

A 

=  j-^ 

{i-p)/{2K-t) 

J 

P. 

A 

+  K^J 

JI-p)/(2K-l) 

J 

> 

A 

J  -  K 

Let  jur  --  p  -  ( i  -  p)  /[ 2 K -  i)  ;  then  the  can  be 

expressed  as  follows.  For  !  £  J  ^  K ,  !  ^  -i  ^  K  ,  we  have 

PjA  "  Pj^K,  -■ 

■7,  Kt  &  "  ^  ,  A  ^ 

where  r  -  (l  -u^)  /  PP  =  (  I  -  p)  2  K  -  I )  .  This  means  that  the  transition 
probability  from  a  stimulus  to  its  transform,  or  vice  versa,  is  r  -h  ur  , 
while  for  any  two  unrelated  stimuli,  the  transition  probability  is  r 
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A s suming  5;  6  [  S/ j  we  have 


7 


K  K 

I  (cj  -h  Acr-)(ri^  (o6^'^^))  +  (r-hurcfj^)  cl)(i 


2K(f 


•  =  /  ^  =  1 


a + K) ) 

ai  ) 


+  fj,  i^(r  ^  xjj-cfj^)  (P  )  +  r  (p  {oL' ^  ^  ^ 


(i)  ^ 


K  K 


^  ^  2K(f  ^  ^  1  " 

;-/  ^  =  /  I 


0  r 


0  cy 


I  Ad'i-  ^r(t){(y'^^)  i  I  r  +  vxd-^^)  (p 


K 

/  1  V  / 

^  ■■  ,  — ^  y  r  -h  iu  I  i  r  '  /  C-  '  oi 

4-1  ' 


fS) 


2Krf 


V-  0)  ry. 


(4rKj\\,  .  r/Aiix  /  (K^i) 

f - 0  1  CM 

2  Kef  '  ^ 


Thus  if  -p  (or  jjy  }  is  nearly  1  and  is  large,  S’;  will 
generalize  to  its  transform,  and  conversely  T^'ji)  will  generalize  to  S; 
since 


(K^  i) 

T 


xKrf 


{■JKc) 


■h 


fj  A/JX 

2  Kef 


o(AA 


To  get  the  specific  form  of  the  conditions  for  such  generalization  to  occur, 

K 

we  extract  the  term  for  f  -  I  in  2^  and  put  it  with  the  second  term.  This 

^  ' 

gives  the  first  required  inequality, 


( 2Kef)l'2K(jr  i-  r^-ux  4  r  -h  A  x^'~)  ^  Q 
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or,  replacing  r  and  w  in  terms  of  p  ,  and  2  K  by  n  ,  we  get  the 
condition 

i)  If  (q_  +  Ap)/  (fn  then  ( S  i)  +  A^o\(t  (Si))  . 

The  second  required  inequality  turns  out  to  be 

(fl(K-f)/2Kd'){2Kc^r  +  ^  ur  f  y^-r)  Q 
or,  replacing  f*  and  ^  in  terms  of  p  ,  we  get 

ii)  If  (n  -  2)\^(^(n-l)-h  a(I- p)^l2n{n-i)d<Q^  then  a[J(5i)  ^  +  A^^\t(Si))  • 

iii)  If  both  inequalities  hold,  then  ^^^5,0  =  A^^'^(S;)  +  /I  ^  \y(Si))  ■ 

Necessary  and  sufficient  conditions  that  both  inequalities  hold,  given  n  >  4  , 
are 

a)  p  >  -2)  (3n  -  4) 

b)  'i\^p  '  3  n  -  4)  n-i-2  I  /(n-l)(n  - 4) 

c)  p  '  (1  must  be  so  chosen  as  to  satisfy  i)  and  ii). 

For  n  =  4  ,  these  conditions  are  satisfied  if  p  >  I  ^4  and 
4/f^fAp)  ^  l2/\3(^>^L(l-pj] 
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16.6  Analysis  of  Value -Conserving  Models 


In  dealing  with  simple  perceptrons,  a  single  value -conserving 
model,  the  f  -system,  has  been  considered.  In  this  system,  the  total 
value  of  the  set  of  input  connections  to  an  A-unit  is  conserved.  In  four- layer 
and  cross-coupled  perceptrons  two  types  of  value -conserving  systems  are  of 
interest;  the  -system,  defined  as  beofe,  (where  the  sum  of  the  input  values 
is  held  constant)  and  the  H  -system,  where  value  is  conserved  over  the  set  of 
output  connections  from  an  A-unit,  rather  than  the  inputs.  In  the  perceptrons 
to  be  considered  in  the  following  chapters,  this  second  system  appears  to  offer 
important  advantages  in  performance,  and  will  generally  be  preferred  over  the 
T’  -system. 

The  most  important  difference  between  the  ^ -system  and  the 
r  -system  is  that  the  latter  tends  to  activate  those  A-units  which  would 
respond  to  the  most  probable  successor  of  the  present  stimulus,  whereas  the 
-system  tends  to  activate  the  set  of  A-units  which  respond  to  the  stimulus 
for  which  the  present  stimulus  is  the  most  probable  predecessor.  The 
difference  between  these  two  situations  can  be  seen  from  the  following  example. 
Suppose  there  are  three  stimuli.  A,  B,  and  C,  with  transition  probabilities  as 
shown  in  the  following  diagram; 


In  this  case,  with  the  /’-system,  we  would  expect  the  set  of  A-units 
responding  to  stimulus  A  to  become  most  closely  associated  to  the  set 
responding  to  stimulus  C,  since  A  is  the  only  possible  predecessor  of  C, 
whereas  B  can  be  preceded  by  either  A  or  C.  In  a  P  -system,  on  the  other 
hand,  the  set  responding  to  A  would  be  most  closely  coupled  to  the  set 
responding  to  B,  and  might  even  develop  inhibitory  connections  to  the  set 
responding  to  C,  since  B  is  the  most  common  successor  of  A.  Thus  the 
r  -system  tends  to  be  predictive,  tending  to  anticipate  the  most  likely 
successor  of  the  present  stimulus,  whereas  the  -system  tends  to  antici¬ 
pate  the  stimulus  which  is  most  likely  to  be  preceded  by  the  present  stimulus. 
As  shown  above,  this  latter  choice  is  not  necessarily  a  good  prediction  of 
the  next  event . 

16.  6.  1  Analysis  of  /  -systems 

The  differential  equation  for  the  /  -system  is  identical  with 
(16.  11),  except  that  the  constants  are  now  equal  to 


i  -  I 


The  negative  term,  -  V;  >  is  familiar  from  previous  analyses  of  the 
/-system,  and  represents  the  quantity  substracted  to  balance  the  gain 
in  value  of  the  active  connections.  It  will  be  recalled  that  for  a  Poisson 
model,  Q-^  -  Q-  is  always  equal  to  or  greater  than  zero,  so  that  the 

expected  value  of  C-j  will  remain  positive,  and  the  previous  analysis 

(Section  16.2.  1)  applies  without  modification.  More  generally,  however,  and 
for  a  binomial  model  in  particular,  the  C;j  may  be  negative,  and  the 
previous  analysis  must  be  reexamined  to  see  how  this  affects  the  situation. 


-386- 


To  begin  with,  it  no  longer  follows  that  the  solution  will  be 
monotone,  since  different  combinations  of  positive  and  negative  Cij  's  may 
be  picked  up  in  equation  (16. 11),  depending  on  which  (j)  's  are  currently  non¬ 
zero.  Since  the  solution  is  non-monotone,  it  also  does  not  follow  that  a 
solution  will  occur  in  n  steps,  or  that  the  solution  of  the  iteration  equation 
(16.13)  is  minimal . 


While  we  are  unable,  at  this  time,  to  provide  any  short-cut 
method  of  finding  the  steady  state  solution  (if  one  exists)  for  the  'f  -system, 
it  is  possible  to  compute  a  time -dependent  solution  by  the  following  procedure. 
We  note,  first,  that  the  solution  is  piecewise  exponential,  as  in  the  case  of  the 
"V  -system,  and  that  the  time  constants  for  all  are  equal.  This  means 

that  we  can  readily  determine  which  will  be  the  first  to  cross  the  level 

of  6’  ,  by  computing  the  initial  asymptotes,  for  all  J  .  The  ^ 

with  the  highest  value  of  will  change  most  rapidly.  If  the  initial 


f  I  I  ■ 

^alue  of  ,  and  is  negative, 


will  immediately  go 


to  0  .  If  no  A4  is  negative,  then  the  first  change  to  occur  will  be  for  some 
to  change  from  0  to  I ,  and  this  will  occur  for  that  j  for  which  is 

greatest.  Having  thus  obtained  the  first  discontinuity  point,  ,  we  can 

compute  the  values  of  all  ^  >  and  determine  the  next  (p  to  change. 


This  is  done  by  computing  the  function 


A/ 


(i) 


y-i, 


gT 


(16.23) 


Joseph  has  pointed  out  that  singularities  are  possible.  For  example,  with 
0  t  ,  (f  =  t  ,  /J !  /  ,  and  •  0  ,  if  C  we  have  {at  t  3/^  ) 

^  then  -  (fi  while  ~  Tf 2  •  Thus 

immediately  falls  below  1,  hence  back  to  the  original  equation,  which  brings 
it  back  to  1  again.  While  'fy  thus  fluctuates  about  1,  the  future  history  of 
is  not  determined. 
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for  all  i.  Note  that  ^  will  be  greater  than  1  only  if  the  numerator  and 
denominator  agree  in  sign,  and  (M/d'  ~  T)  >  |0  ~  /3  ~  'f\  •  these 

conditions  are  met  (i.e.,  if  '^/^  >  /  )>  cp{(X^-^^)  will  change  value  some¬ 
time  before  reaches  its  new  asymptote.  Thus,  by  finding  the  value 

(or  values)  of  i  for  which  is  maximum,  at  the  discontinuity  time  t.^ 

we  can  always  determine  the  next  (j)  to  change.  Introducing  this  new  0 
gives  us  a  new  set  of  asymptotes,  ,  and  the  process  can  be 

continued.  The  values  of  the  at  the  discontinuity  times  can  be 

readily  calculated  from  the  exponential  solution: 


-  r/t/ 


(16.24) 


where  the  discontinuity  time,  t ^  ,  is  obtained  by  solving  the  equation  for 

the  next  to  cross  threshold,  that  is 


16.6.2  Analysis  of  F  -systems 

The  ~  -system  is  similar  to  the  i’’  -system,  except  that 
after  each  increment  of  reinforcement,  the  total  value  is  restored  to  its 
former  level  by  subtracting  the  net  gain  uniformly  from  the  set  of  output 
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connections  from  an  A-unit,  instead  of  the  input  connections.  The  differen¬ 
tial  equation  now  takes  the  form 


U) 


oL  t 


- 


.d2) 


(16.26) 


The  same  uncertainties  as  to  existence  of  steady  state  solutions 
and  difficulties  of  computation  occur  here  as  in  the  case  of  the  'f  -system 
analysis.  A  time -dependent  solution  can  again  be  computed,  piecewise,  by 
the  same  procedure  as  above.  In  chapter  19,  we  shall  reconsider  the 
P  -system,  in  connection  with  cross -coupled  perceptrons. 


16.7  Functionally  Equivalent  Models 


In  Ref.  41,  Joseph  has  presented  an  analysis  of  a  perceptron  with 
"binodal  A-units",  which  is  now  seen  to  be  functionally  equivalent  to  a  variation 
of  the  system  analyzed  above.  In  the  binodal  model,  there  is  only  a  single 
layer  of  A-units,  but  each  A-unit  receives  two  logically  distinct  sets  of  input 
connections  and  has  a  separate  threshold  for  each  set.  The  first  set  of 
connections  is  fixed  in  value,  and  activates  the  A-unit  according  to  the  usual 
rules.  The  second  set  consists  of  a  single  connection  from  every  sensory 
point  in  the  retina,  and  is  variable  in  value.  The  reinforcement  rule  for 
these  variable  connections  is  that  if  the  A-unit  is  active  at  time  t  ,  and  the 
retinal  origin  point  of  one  of  the  variable  connections  is  active  at  /  y  /  ,  the 

variable  connection  gains  an  increment  in  value.  At  the  same  time,  all 
variable  connections  tend  to  decay  at  a  fixed  rate,  (f  .  This  is  equivalent 
to  a  four-layer  model  in  which  each  unit  receives  its  fixed  connection 

from  an  ^  unit  with  a  normal  number  of  input  connections  and  threshold  Q  , 
and  receives  variable  connections  from  other  A^  ^  units,  each  having  a 

single  excitatory  input  connection,  and  a  threshold  of  1  .  The  main  difference 
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(2  ) 

from  the  above  analysis  would  then  be  that  the  A  unit  responds  to  the 

logical  sum,  rather  than  the  algebraic  sum,  of  the  inputs  from  the  fixed 

(2) 

connections  and  the  variable  connections,  i.e,,  the  A  unit  is  active  if  its 
fixed  connection  (the  /3  -component)  is  active,  or  if  the  sum  of  the  variable 
connections  (the  ^  -component)  ^  Q  .  As  this  writer  had  previously 
predicted  on  heuristic  grounds,  Joseph  has  successfully  demonstrated  that 
sim.ilarity  generalization  will  tend  to  occur  in  the  binodal  model,  after  a 
preconditioning  sequence  analogous  to  those  discussed  above.  In  this  system, 
the  set  of  fixed  connections  acts  as  a  "template",  and  the  variable  connections 
tend  to  adapt  themselves  to  an  origin  configuration  which  resembles  the  fixed 
set  under  the  transformation  T.  The  reader  is  referred  to  Reference  41  for  a 
quantitative  analysis. 

While  it  was  assumed  that  the  models  analyzed  in  the  preceding 

sections  had  a  complete  set  of  connections  (from  every  A  unit  to  every 

(2) 

A  unit),  a  system  which  merely  has  a  large  number  of  input  connections 
to  each  unit,  originating  from  randomly  selected  units,  can  be 

seen  to  be  equivalent  in  all  of  its  essential  properties.  For  such  a  system, 

'2) 

the  ■  j  matrix,  representing  the  expected  values  of  the  fractions  of  A 
units  responding  to  !j-  and  5;  ,  would  have  the  same  equations  as  before, 

except  that  A/^  must  be  replaced  by  the  number  of  variable  connections  to 

■  2} 

each  A  unit. 

In  the  following  chapter,  it  will  be  shown  that  a  form  of  weakly 
cros s -coupled  system,  in  which  there  are  no  closed  loops,  is  also  virtually 
equivalent  to  the  model  analyzed  in  this  chapter,  and  can  be  represented  by 
the  same  equations,  with  a  slight  reinterpretation  of  the  ,1  -component  of 
the  input  signals  to  the  A-units. 
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OPEN -LOOP  CROSS -COUPLED  SYSTEMS 


1  7  . 


The  most  interesting  features  of  cross -coupled  perceptrons  are 
those  which  result  from  the  possibility  of  closed  feed-back  loops,  or 
cycles,  in  the  network.  It  is  possible,  however,  to  design  a  cross -coupled 
system  with  no  closed  loops,  and  such  a  system  has  a  number  of  important 
features,  including  the  ability  to  act  as  an  adaptive  similarity-generalizing 
system  equivalent  to  the  perceptrons  of  Chapter  16,  and  increased  economy 
and  versatility  in  general  classification  problems  of  the  sort  considered  in 
Chapter  5,  These  properties  will  be  considered  briefly,  in  this  chapter, 
before  proceeding  to  closed-loop  systems,  which  represent  a  more  challenging 
problem  in  analysis, 

17,1  Similarity-Generalizing  Systems:  An  Analog  of  the  Four-Layer  System 


The  three-layer  perceptron  shown  in  Fig,  45  is  directly  comparable 
to  the  four-layer  system  considered  in  the  last  chapter.  The  A-units  are 
divided  into  two  subsets,  called  A'  and  A".  All  A-units  receive  fixed 
connections  from  the  retina,  but  only  the  A"  units  have  connections  to  the 
R-units,  the  A'units  sending  their  output  signals  to  the  A"  units.  Each  A'unit 
is  connected  (in  a  fully-coupled  model)  to  all  A"  units,  and  each  A”  unit  is 
connected  to  all  A'  units.  The  rule  for  modifying  the  connections  from  A' 
to  A"  units  is  identical  with  the  rule  for  modifying  A^^^  to  A^^^  connections, 
in  the  four -layer  system  considered  previously:  If  the  origin  of  the  connection 
is  active  at  time  t,'  and  the  terminus  is  active  at  t  +  1,  the  connection  gains  a 
quantity  .  All  inter-A-unit  connections  decay  at  a  rate  (f  ,  as  before. 
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Figure  45  OPEN-LOOP  CROSS-COUPLED  SYSTEM  (COMPARE  Figure  42).  BROKEN  LINES 
INDICATE  VARIABLE  CONNECTIONS. 


Clearly,  the  only  differerence  between  this  model  and  the 
previous  one  is  that  the  ,3  -component,  instead  of  originating  from  one 
of  the  units,  comes  direct  from  the  retina,  and  consequently  can  take 

on  more  than  two  values.  The  differential  equation  (16.  11)  and  the  equi¬ 
librium  equation  (16.  12)  thus  apply  without  modification  to  this  system 

(where  the  A'  set  is  equated  with  the  A'  '  set,  and  the  A”  set  with  the 

(2) 

A  set).  The  additional  freedom  in  choice  of  /i  -values  means  that  the 
sets  designated  d!,  ,  representing  sets  of  units  whose  /5  -value 

in  response  to  S;  is  t*-  /  ,  must  now  be  fractionated  into  subsets  for 

each  possible  value  of  /j  ,  and  the  history  of  each  such  subset  (having  a 
given  /5  -vector)  must  be  followed  separately.  Thus  the  full  designation 
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.  Apart  from  this  further 


{2 ) 

of  such  a  subset  would  be  ^ o  (/3i  ,  S;) 
fractionation  of  the  A-set,  the  same  analysis  holds  as  in  the  last  chapter, 
and  much  the  same  results  would  be  expected. 


17.2  Comparison  of  Four-Layer  and  Open-Loop  Cros s -Coupled  Models 


A  numerical  comparison  of  the  performance  of  the  perceptrons 
considered  in  this  and  the  preceding  chapter  will  be  based  on  the  following 
experiment: 


EXPERIMENT  12;  Take  an  environment  of  four  stimuli,  ...  5^  ,  each 

having  retinal  area  k  -  .2  .  The  intersections  Lyj  and 

are  each  equal  to  .  '  ,  and  all  other  intersections  are  zero.  The 
perceptron  is  exposed  to  the  following  sequence,  which  is 
repeated  until  a  steady  state  is  attained: 


'  I  - 2  ~  I  ^ I  ''2  ^ I  V  “  •  ^3  '^4  ^3  '4  ''j  ^4  ^4  )  ‘  sequence 

can  be  considered  to  consist  of  two  events,  the  first  consisting  of 
the  alternating  pair  5^  5.,...  with  a  duration  of  C  r  , 

and  the  second  consisting  of  Sj  >  also  with  duration 


of  10 'V  .  A  matrix  of  Cv  -  functions  is  obtained  at  the 
beginning  and  end  of  the  preconditioning  procedure,  to  compare 
steady  state  with  initial  conditions. 


The  relationship  among  the  four  stimuli  can  be  seen  from  the 
following  Venn-diagram  of  the  retinal  sets,  where  the  double -headed  arrows 
indicate  the  oscillating  pairs  of  stimuli,  and  the  number  in  each  cell 
indicates  its  area. 
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The  initial  and  terminal  Q-matrices  have  been  computed  for  a  four-layer 
and  open-loop  cros s -coupled  perceptron,  as  a  function  of  the  parameter 
/  (f  .In  both  models  the  parameters  of  the  A'  units  (or  of  all 
A-units,  in  the  cros s -coupled  case)  were  x.  ~  ‘j  >  tj  =  0  ,  and  Q  =  2  , 
with  a  binomial  model.  In  the  four-layer  model,  9'''^  was  also  taken  to  be 
2  ,  so  that  the  systems  are  directly  comparable. 

The  Q-matrices  obtained  in  this  experiment  are  shown  in 
Tables  5  and  6.  The  important  Q-functions  are  also  shown  graphically  in 
Fig.  46,  as  a  function  of  the  parameter  A./^^  /  <f  •  Note  that  for  both 

models,  there  is  a  considerable  parametric  range  within  which  generalization 
is  much  greater  for  stimuli  which  belong  to  the  same  event  than  for  stimuli 
from  different  events.  This  gain  in  generalization  between  5^  and  5,  , 

and  between  Jj  and  5,  is  more  than  sufficient  to  offset  the  handicap  of 
the  intersections  between  5,  and  5,  ,  and  between  5,  and  5^  ,  which 

gives  the  system  an  initial  disadvantage.  The  cross -coupled  model,  while 
it  follows  a  similar  history,  has  a  considerably  greater  "usefi'l  range" 
than  the  four -layer  model.  For  the  four -layer  system,  the  range  of 


TABLE  5 

Q-MATRICES  FOR  FOUR-LAYER  a- PERCEPTRON  IN  EXPERIMENT  12 

(PARAMETERS;  X  =  3,  (/  =  Q,  9-2) 


INITIAL  Q-MATRIX: 


.  104 

.000 

.034 

.000 

.000 

.  104 

.000 

.034 

.034 

.000 

.104 

.000 

.000 

.034 

.000 

.  104 

TERMINAL  MATRICES  FOR: 
7/.0  <  88.9 


88.9  <  <  166.6 


A'q  <  166.6 


104 

.0/0 

.034 

.000 

070 

.174 

.000 

.034 

034 

.000 

.104 

.070 

000 

.034 

.070 

.174 

174 

.  140 

.034 

.000 

140 

.  1  74 

.000 

.034 

034 

.000 

.  174 

.140 

000 

.034 

.  140 

.1/4 

314 

.280 

.034 

.280 

280 

.314 

.280 

.034 

034 

.280 

.314 

.280 

280 

.034 

.280 

.314 
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TABLE  6 

Q-MATRICES  FOR  OPEN-LOOP  CROSS-COUPLED  c^-PERCEPTRON  IN  EXPERIMENT  12 


(PARAMETERS:  X-  3,  -  0,  9  =  2) 


INITIAL  Q-MATRIX: 


TERMINAL  MATRICES  FOR; 
38.5  <  A'a  <  ‘t'1.5 

4N.5  <  <  77.0 


(.104  .000 

.000  .104 

.034  .000 

.000  .034 


.034  .000  \ 

.000  .034  \ 

.104  .000 

.000  .104  / 


.018 

.104 

.000 

.034 


.034  .000  \ 

.000  .034  I 

.122  .018  1 
.018  .104  j 


/.I22  .036 

/  .036  .122 

I  .034  .000 

\.000  .034 


.034  .000  \ 

.000  .034  I 

.122  .036  I 
.036  .122  / 


77.0  <  A/^  %  <  83.3 


83.3  <  <  88.9 


88.9  <  A'a  %■  <  I  17.6 


i  17.6  <  A/^  ^  <  166.6 


166.6  <  A/^  %  <  235.2 


%  >  235.2 


/.I74  .082 

/  .082  .122 
1  .034  .000 
\.000  .034 


.034  .000  \ 

.000  .034  \ 

.174  .082  j 

.082  .122  / 


(.183  .097 

.097  .140 

.034  .027 

.027  .034 

(.192  .131 

.131  .192 

.034  .036 

.036  .034 


.034  -.027  \ 
.027  .034  I 

.183  .097 

.097  .140  / 

.034  .036  \ 

.036  .034  \ 

.192  .131  I 
.131  .192/ 


/.2I0 
I  .176 
1  .034 
\.072 


.176  .034 

.210  .072 

.072  .210 

.034  .176 


.072 

.034 

.176 

.210 


.262  .228  .034  .176 

.228  .262  .176  .034 

.034  .176  .262  .228 

176  .034  .228  .262 


(.314  .280 

.280  .314 

.034  .280 

.280  .034 


.034  .280  \ 

.280  .034  \ 

.314  .280  I 
.280  .314  J 
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for  which  the  system  tends  to  classify  events  "correctly"  is 
77.0  to  166.6,  while  for  the  cross -coupled  model  this  range  is  extended  to 
38.5  to  238.2.  Thus  the  cross -coupled  model  begins  to  show  the  generali¬ 
zation  effects  earlier,  and  saturates  later  than  the  four-layer  system. 
Moreover,  the  transition  occurs  more  gradually,  in  eight  steps  for  the 
cross -coupled  system  as  opposed  to  three  for  the  four-layer  model. 

The  matrices  shown  here  assume  -system  reinforcement. 

A  /  or  r  -system,  with  the  four -layer  model,  e  1  i  m  i  n  a  t  e  s 

'2'. 

all  4  '  activity  immediately,  in  this  experiment.  In  the  cross -coupled 
model,  however,  activity  is  not  completely  eliminated.,  and  the  terminal 
Q-matrices  obtained  for  a  y  -perceptron  are  shown  in  Table  7.  Note 
that  the  bias  favoring  and  ^  eliminated  for  most  values  of 

//  /  ,  and  that  the  "dynamic  range"  is  greater  than  in  the  06 -system. 

The  /’  -system,  illustrated  in  Table  8,  is  similar  to  the  y  -perceptron 
for  small  values  of  ,  <'J  '  ,  but  it  appears  to  "saturate"  more  easily. 

While  the  performance  of  the  cros s -coupled  perceptron  closely 
resembles  the  system  in  Chapter  16,  it  is  a  somewhat  more  satisfying 
model  from  the  standpoint  of  biological  plausibility  and  parsimony,  since 
it  does  not  require  the  assumption  of  a  special  set  of  fixed  connections 

I)  >■/) 

from  A  to  A  units  in  addition  to  the  variable  connections  -  an 

assumption  which  was  necessary,  in  the  four-layer  system,  to  provide  a 

"template"  for  the  organization  of  similar  A  units  to  be  connected  to 

(2) 

each  A  unit,  and  in  order  to  prevent  all  connections  from  decaying  to 
zero  value.  In  the  present  scheme,  all  S -A  connections  are  fixed,  and 
all  other  connections  variable,  yielding  a  conceptually  simpler  organization. 
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TABLE  7 

(^-MATRICES  FOR  OPEN-LOOP  CROSS-COUPLED  ^-PERCEPTRON 
IN  EXPERIMENT  12 


{ Parameters:  X=  3,  y=0,  0-2) 


INITIAL  Q-MATRIX: 

TERMINAL  MATRICES  FOR: 

0  <  <:  68.7 

(f 


^  Y) 

68.7  <  — ^  <  85.8 
(f 


85.8  <  — <  101 


101  <  — ^  <  152 

cT 


152 


< 


cf 


<  303 


—9:^  >  302 

<S 


(.009 
.002 
.002 
.002 

/.0I9 
/  .012 
I  .008 
\.008 


/.o22 
/  .022 
1  .014 
\.0|i| 

(.025 
.025 
.017 
.017 

/.030 
/  .030 

1  .030 

\.030 


.000 

.034 

.000 

.  104 

.000 

.034 

-.000 

.104 

.000 

.034 

.000 

.104  j 

.000 

.001 

.000  '' 

.008 

.000 

.001 

.000 

.008 

.000 

.001 

.000 

.008  j 

.002 

.002 

.002 

.009 

.002 

.002 

.002 

.009 

.002 

.002 

.002 

.009  J 

.012 

.008 

.008  \ 

.012 

.008 

.008 

.008 

.019 

.012 

.008 

.012 

.012  J 

.022 

.014 

.014  \ 

.022 

.014 

.014 

.014 

.022 

.022 

.014 

.022 

.022  y 

.025 

.017 

.017  \ 

.025 

.017 

.017 

.017 

.025 

.025 

.017 

.025 

.025  / 

.030 

.030 

.030  \ 

.030 

.030 

.030  ' 

.030 

.030 

.030  1 

.030 

.030 

.030  / 
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TABLE  8 

Q-MATRICES  FOR  OPEN-LOOP  CROSS-COUPLED  T-PERCEPTRON 
IN  EXPERIMENT  12 


(Parameters:  X=3,y=Q,d  =  2) 


INITIAL  (J-HATRIX: 


TERMINAL  MATRICES  FOR: 


0  < 


& 


<:  58.5 


58.5  <  — <  77.8 
(f 


77.8  <  <  88.5 

0 


/.  1 04 

.000 

.034 

/  .000 

.104 

.000 

1  .034 

.000 

.104 

o 

o 

o 

.034 

.000 

/.008 

.000 

.001 

/  .000 

.008 

.000 

1  .001 

.000 

.008 

\.000 

.001 

.000 

f  .000 

.002 

.002 

1  .002 

.009 

.002 

1  .002 

.002 

.009 

\.002 

.002 

.002 

/.0I9 

.012 

.008 

/  .012 

.012 

.008 

1  .008 

.008 

.019 

oo 

o 

o 

.008 

.012 

88.5  < 


(f 


<  92.0 


92.0 


(f 


<  131 


131  < 


(f 


<  181 


d 


>  181 


/.022 

.015 

.014 

(  .015 

.015 

.014 

1  .014 

.014 

.022 

\.0I4 

.014 

.015 

/.025 

.025 

.020 

[  .025 

.025 

.020 

1  .020 

.020 

.025 

\.020 

.020 

.025 

/.028 

.028 

.026 

(  .028 

.028 

.026 

,  .026 

.026 

.028 

\.026 

.026 

.028 

/.030 

.030 

.030 

.030 

.030 

.030 

.030 

.030 

.030 

V.030 

.030 

.030 

.000  \ 

.034  \ 

.000  I 

.104/ 

.000  \ 
.001  \ 
.000 
.008  } 

.002  \ 
.002  ] 
,002  I 

.009  / 

.008  \ 
.008  \ 
.012  I 
.012  / 

.014  \ 
.014 
.015  1 
.015  / 

.020  \ 
.020  I 
.025  I 
.025  / 

.026  \ 
.026  \ 
.028  I 
.028  / 

.030  \ 
.030  \ 
.030 
.030  j 


-400- 


It  will  be  seen  in  Chapter  19  that  this  system,  with  the  addition 
of  a  unit  time -delay  (all  =  /  )  performs  identically  to  a  closed  loop 

fully  cross -coupled  perceptron  for  the  first  two  cycles  of  operation.  By 
further  extension  of  the  network  along  the  same  lines,  it  will  be  shown  that 
additional  cycles  of  closed-loop  activity  can  be  duplicated. 

17.3  Reduction  of  Size  Requirements  for  Universal  Perceptrons 

In  the  case  of  simple  perceptrons,  it  was  demonstrated  that  in 
order  to  obtain  a  "universal  perceptron",  in  which  a  solution  exists  for  any 
classification  of  n  stimuli,  at  least  n  A-units  are  required  (Theorem  3, 
Corollary  2,  Chapter  5).  Now  consider  an  open-loop  cross  coupled  perceptron, 
constructed  as  follows:  Let  the  A-units  be  numbered  in  series  a,  ,  a.^ 
and  let  (the  number  of  S-points).  The  last  of  these  units,  , 

has  an  output  connection  to  an  R-unit.  Each  A-unit  has  a  variable -valued 
connection  from  every  S-point,  plus  one  connection  for  every  A-unit  prior 
to  itself  in  the  series;  i.e.  ,  a ;  receives  a  connection  from  every  S-point 
and  from  ,  n  o  ■  ^ 

It  has  been  demonstrated  by  Cameron  that  for  small  values  of  n 
n  -  J  ^  )  only  Joy^  in)  A-units  are  required  in  order  to  obtain  a 
universal  perceptron,  in  which  a  solution  exists  for  all  of  the  2^  possible  classi¬ 
fication.  This  was  demonstrated  by  explicit  construction  for  n  as 
large  as  8.  At  some  higher  value  of  n  ,  this  ceases  to  be  true,  although 
the  maximum  n  for  which  the  observation  holds  true  has  not  yet'been 
determined . 

S.  Cameron,  personal  communication. 
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A  lower  bound  for  the  number  of  A -units  required  for  a 

universal  perceptron  in  such  a  system  has  been  obtained  by  Joseph  (although 

it  is  not  a  least  upper  bound),  The  analysis  (given  in  the  Appendix  of 

Ref.  41)  is  based  on  the  Hay-Joseph  theorem  that  the  maximum  number  of 

orthants  achievable  by  linear  combinations  of  r  vectors  in  n  -space  is 

n  ^ 

approximately  M{n,r)  =  - ; —  where  n  is  large,  and  r  is  small 

[r-i)  I 

relative  to  n  .  An  upper  bound  for  the  number  of  dichotomies  achievable 
with  /V^  A-units  is  found  to  be  M{2  ,  N^  + 2)  . . .  M  N^-h  . 

It  is  shown  that  for  large  A/^  the  number  of  possible  dichotomies  is  increasing 
at  a  much  greater  rate  than  the  number  of  achievable  dichotomies,  so  that 
there  must  be  some  point  at  which  the  system  ceases  to  act  as  a  universal 
perceptron . 
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18. 


Q-FUNCTIONS  FOR  CROSS-COUPLED  PERCEPTRONS 


A  general  cross -coupled  perceptron  is  illustrated  in  Figure  47. 

It  consists  of  three  layers  of  units,  with  complete  freedom  of  interconnection 
among  the  A-units.  Due  to  the  likelihood  of  closed  circuits  of  connections 
within  the  network,  this  is  called  a  closed-loop  system. 


Figure  47  TYPICAL  CONNECTIONS  IN  A  CLOSED-LOOP  CROSS-COUPLED  PERCEPTRON 


In  passing  from  open-loop  to  closed-loop  networks,  several 
fundamentally  new  considerations  enter  into  the  analysis.  In  the  first 
place,  the  state  of  the  network  at  time  r  becomes  a  function,  not  only 
of  the  present  sensory  input  and  the  momentary  values  of  the  connections, 
but  of  the  preceding  sequence  of  inputs  and  past  activity  states  as  well. 
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The  dependence  of  the  system's  state  upon  time -sequences  of  previous 
states  means  that  the  transmission  time,  T'j  ,  which  previously 
played  no  part  or  only  a  minor  part  in  the  analysis  of  system  performance, 
now  becomes  a  parameter  to  be  reckoned  with  at  all  times.  The  question 
of  network  stability  is  also  a  fundamental  one;  some  cross -coupled 
networks,  once  triggered,  will  explode  into  total  activity  which  prevents  any 
further  stimuli  from  making  any  impression  at  all,  others  will  oscillate,  and 
others  will  settle  down  to  a  stable  steady-state  condition.  In  this  chapter, 
we  begin  by  re-examining  the  concept  of  Q-functions,  in  order  to  provide  a 
means  of  measuring  the  response  of  the  netw'ork  to  sequences  of  stimuli, 
and  comparing  its  response  quantitatively  for  different  stimulus  sequences. 
These  new  Q-functions  will  be  found  to  encompass  the  functions  analyzed 
in  Chapter  6  as  a  special  case. 

18,1  Stimulus  Sequences:  Notation 


In  Chapter  4,  a  stimulus  was  defined  as  any  set  of  input  signals 
to  sensory  units  of  a  perceptron,  excluding  the  null  stimulus.  In  practice, 
these  signals  are  generally  taken  to  be  1  or  zero.  For  present  purposes, 
the  null  stimulus  (all  signals  equal  to  zero)  will  be  re -admitted  as  a  stimulus, 
and  will  be  symbolized  by  0  when  it  occurs  as  part  of  a  sequence.  A 
stimulus  sequence,  ^  !j-  can  be  an  arbitrary  series 

of  stimuli  which  are  assumed  to  occur  at  successive  discrete  times 
t,  ,  '  I  f  /  t  ,  1 1  ^  2/  f  I  i  'r?  -  /  .  An  arbitrary  set  of  stimulus 

sequences  can  be  taken  to  comprise  a  stimulus -sequence  world,  for  a 
given  perceptron,  in  the  sense  of  Definition  26  of  Chapter  4. 
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In  this  and  the  subsequent  chapters,  it  will  be  assumed  that  the 

transmission  time,  r-j  is  equal  to  At  for  all  connections,  C-j  ,  and 

this  transmission  time  will  be  symbolized  in  abbreviated  form  by  T 

Consequently,  if  a  stimulus  5;  occurs  at  time  t  ,  the  response  to  this 

stimulus  in  the  A -system  occurs  at  time  t  t  T  ,  and  Q-^  is  interpreted  to 

mean  the  probability  that  an  A-unit  is  activated  at  time  t  if  5;  occurs  at 

time  t  -  V  .  In  a  c ros s -coupled  perceptron,  however,  Q-^  is  not  a  well- 

defined  quantity,  since  in  addition  to  signals  from  the  retina,  an  A-unit  may 

receive  signals  from  other  A-units  at  time  t  ,  so  that  the  response  at  time  t 

depends  both  on  ']  ( t  -  r)  and  on  the  activity  state  of  the  association  system 

at  r  -  r  •  Ql  is  therefore  redefined  to  apply  to  sequences  <J ■  of  length 

m  ,  which  begin  at  time  t  -  mT  ,  and  terminate  at  t  -  T  ,  with  the  association 

system  assumed  to  be  totally  inactive,  or  "silent"  at  time  t  -  mT  .  In  this 

case,  for  a  sequence  of  length  1,  Q-  is  interpreted  in  the  usual  manner, 

and  is  represented  by  the  equations  of  Chapter  6,  without  modification.  For  a 

general  sequence  of  length  rn  ,  we  use  the  notation  C^-  to  designate  the 

^  m 

probability  that  an  A-unit  is  active  at  time  t  ,  given  that  the  sequence  J- 
began  at  time  t  -  mT  ,  so  that  the  m  member  of  the  sequence  occured  at 
t  -  T  .  More  generally,  we  can  write  to  designate  the  probability 

that  an  A-unit  is  active  at  time  t  if  the  sequence  began  at  f.  -  rV  , 

where  r  may  be  less  than,  equal,  or  greater  than  m  .  If  r  is  less  than  rn  , 
this  is  equivalent  to  the  probability  of  response  to  a  truncated  sequence, 
containing  only  the  first  r  stimuli  of  the  sequence  ^  S',-  ^  i  ■  •  •  S[  )• 

If  r  rn  ,  we  adopt  the  convention  that  the  sequence  .  '  is  understood  to 
have  been  augmented  by  the  addition  of  T  -  tn  null  stimuli,  yielding  the 
sequence  ("3^,  >  O  /  , .  ■ .  ,  .  In  other  words ,  it  is  assumed 

that  the  sequence  J’  began  at  /  ■  f’l  ,  and  that  no  other  inputs  occurred 
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through  time  t  -  T  ,  the  probability  of  A -unit  activity  then  being  determined 

for  time  t  .  In  a  simple  perceptron,  this  probability  would,  of  course,  be 

zero  for  r  y  m  ;  in  a  cross -coupled  system,  however,  the  presence  of 

persistent  cycles  of  activity,  or  reverberating  loops  in  the  A-system,  may 

maintain  Q-  >  0  for  an  indefinite  period. 

‘  r 

Q' I  is  redefined  in  a  manner  analogous  to  Q-  .  Where  J- 
and  J;  are  any  two  sequences,  we  define 

0-  ■  -  probability  that  an  A -unit  responds  at  time  t  if  J; 

begins  at  t  ~  u'l  ,  and  also  responds  at  time  t 
if  J-  begins  at  t  -  ■ 

<J 

It  is  again  assumed  that  the  A-system  is  "silent"  at  the  start  of  each  sequence 
for  which  the  Q-function  is  defined,  and  that  if  or  is  greater  than  rn  > 
the  corresponding  sequence  is  augmented  by  a  sufficient  number  of  null 
stimuli  at  the  right-hand  end.  Q-functions  with  arbitrary  numbers  of  sub¬ 
scripts  can  b  e  generated  by  an  obvious  extension  of  the  above  definition. 

In  contexts  where  no  ambiguity  can  arise,  the  notation  Q-^j  will 
be  used  to  denote  Q-  •  ^  ,  i.e.  ,  the  probability  that  an  A-unit  responds 

immediately  after  the  termination  of  p/-  and  also  responds  immediately 
after  the  termination  of  Jj  .  Note  that  it  is  not  required  that  the  sequences 
J-  and  Jj  be  commensurate,  i.e.,  the  lengths  m  and  m'  maybe  different 
for  the  two  sequences,  without  requiring  any  redefinition  of  Q-j 
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Generalization  coefficients,  r/.  •  ,  can  be  defined  analogously 

J  l) 

to  Q-functions.  For  example,  in  an  alpha  system,  we  would  have  ’ 

where  7-  •  is  a  measure  of  the  increment  added  to  the  output  signal  of 

the  A-set  responding  after  stimuli  of  the  sequence  a/j  ,  as  a  result  of 

th  f 

an  rv.  -reinforcement  after  the  jm  stimulus  of  the  sequence  .  Again, 

if  the  second-order  subscripts  are  suppressed,  it  will  be  assumed  that 
.7-  •  7.  •  -  the  effect  of  a  reinforcement  immediately  after  the  termi- 

nation  of  J  ■  upon  the  signal  which  follows  immediately  after  the  termination 
of  J-  .  If  reinforcements  are  always  applied  and  measured  immediately 
after  the  end  of  stimulus  sequences,  the  performance  of  the  perceptron  in 
learning  responses  to  such  sequences  can  be  derived  from  the  resulting  G 
matrix,  in  precisely  the  same  manner  as  was  done  for  elementary  perceptrons 
in  Part  II.  Thus  a  knowledge  of  the  Q-functions  for  a  c ros s -coupled  perceptron 
permits  us  to  predict  the  performance  of  such  systems  in  discrimination  and 
generalization  experiments. 


18.2  Gj  Functions  and  Stability 

The  rigorous  analysis  of  for  a  cross -coupled  perceptron 

with  a  finite  number  of  A-units  presents  the  identical  difficulty  which  was 
encountered  in  the  case  of  Q-functions  for  multi-layer  systems  (Section  15.  1). 
The  probability  is,  of  course,  identical  to  the  function  Q-  defined  for 

the  first  stimulus  of  the  sequence  J ■  in  accordance  with  the  equations  of 
Chapter  6;  but  the  probability  already  depends  upon  the  distribution  of 

numbers  of  A-units  which  respond  to  the  first  stimulus,  .  In  order  to 

avoid  consideration  of  these  distributions,  the  Q-functions  obtained  here  will 
always  represent  limits  for  large  networks,  where  it  can  be  assumed  that  the 
actual  proportion  of  A-units  responding  after  equal  to  •  It 

should  be  noted  that  due  to  the  assumption  that  the  sequence  starts  with  a 

"silent"  perceptron,  '< 
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A  number  of  alternative  topological  models  might  be  considered. 
For  convenience,  the  following  analysis  takes  up  the  case  of  a  perceptron  in 
which  both  the  connections  from  the  retina  to  the  A-units  and  the  "internal” 
connections  to  each  A -unit  are  constrained  as  in  the  binomial  model  of 
Chapter  6.  In  this  model,  we  have  five  parameters  for  each  A-unit: 

Q  =  threshold  of  A-unit 

-  number  of  excitatory  connections  from  the  S-set, 
or  retina 

=  number  of  inhibitory  connections  from  the  retina 
=  number  of  excitatory  connections  from  other  A-units 

=  number  of  inhibitory  connections  from  other  A-units 

In  the  present  chapter,  we  shall  be  concerned  only  with  perceptrons  in  which 
all  input  connections  to  A-units  are  fixed  in  value,  regardless  of  where  they 
originate.  Systems  with  modifiable  couplings  between  A-units  will  be 
considered  in  the  following  chapter.  It  is  assumed  that  each  of  the  above 
sets  of  connections  has  its  origin  points  assigned  at  random  from  a  uniform 
probability  distribution  over  the  S-set  or  the  A-set,  as  required.  This 
results  in  the  following  equation  for  : 

11  P,(£J  h(L)  P^Xa) 
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where 


= 

II 

.  ■  j  ( ( .  1  ^  i-p- 

-4 

II 

1  ~  ^  a 

k/,  si.  ■  ^ 

7  J 

!<■  -  fraction  of  S -units  activated  by  V  , 

*•  j  'tv 

Taking  y-  O  ,  ,  •  can  thus  be  developed  recursively  in  terms  of  ^ 

up  to  any  value  of 

For  a  Poisson  model,  in  which  the  number  of  output  connections 
from  each  A-unit  is  constrained  but  the  number  of  inputs  is  a  random 
variable  (or  in  which  both  ends  of  a  set  of  connections  are  picked  at  random) 
equation  (18.  1)  still  applies,  but  the  probability  functions  ,  P,  ,  Pj  ,  and 
must  be  redefined,  in  a  manner  analogous  to  Chapter  6.  It  is  also  possible, 
of  course,  to  have  some  kinds  of  connections  (e.g.,  the  internal  excitatory 
connections)  distributed  binomially,  while  the  other  sets  of  connections  are 

organized  according  to  a  Poisson  model,  so  that  Pj . P^^  need  not  all  be 

of  the  same  type.  For  present  purposes,  however,  we  shall  continue  to 
concentrate  on  the  pure  binomial  model  defined  above.  All  major  conclusions 
undoubtedly  apply  to  Poisson  and  mixed  systems  equally  well. 
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One  of  the  first  questions  to  be  raised  about  such  a  system  concerns 
the  stability  of  the  activity-level,  and  the  possible  tendency  of  the  system  to 
burst  into  total  activity  in  response  to  a  transient  stimulus  (which  would,  of 
course,  preclude  any  possibility  of  learning  or  discrimination  of  different 
stimuli).  Figure  48  illustrates  the  response  to  a  transient  stimulus  (i.e.,  a 
sequence  of  length  1)  for  a  number  of  representative  cases.  Figure  49  presents 
the  response  of  a  number  of  networks  to  a  steadily  maintained  stimulus,  or  a 
sequence  of  stimuli  all  of  which  have  the  identical  area.  (Note  that  it  follows 


from  Equation  (18.  1)  that  the  actual  sequence  of  stimuli  does  not  affect  Q-^  , 
so  long  as  the  stimulus  area,  /P-  ,  is  fixed  for  each  5;  .  Thus  any  two 

sequences  for  which  the  succession  of  /r'-  are  equivalent  will  yield  the  same 
value  of  V-  •  ) 


Figure  48(a)  illustrates  the  effect  of  the  size  of  the  "trigger  sti¬ 
mulus"  upon  the  transient  response  of  the  system.  Note  that  the  final  activity 
level  is  independent  of  ;  it  is  also  independent  of  /j  and  ,  so  long  as 

>  ,  -■  ■  ■  Figure  48(b)  shows  the  effect  of  varying  the  ratio  of  internal 

excitation  to  internal  inhibition  (  >  ^  and  ).  For  a  purely  excitatory 

system,  total  activity  of  the  network  is  likely  to  occur,  in  which  all  A-units 
become  and  remain  active.  As  the  inhibitory  component  is  increased,  a  lower 
level  of  stable  activity  results,  and  with  still  further  increase  in  ij ^  relative 
to  ,  the  initial  transient  activity  will  die  away  entirely.  Figure  48(c) 

shows  that  the  effect  of  increasing  the  threshold  of  the  A-units  is  similar  to 
the  effect  of  increasing  the  internal  inhibitory  component.  It  should  be  noted 
that  all  of  these  r,,'  functions  in  response  to  transient  phenomena  in  a  cross- 
coupled  system  are  identical  to  the  succession  of  -functions  for  successive 
layers  of  a  multilayer  perceptron  (as  discussed  in  Chapter  15).  For  infinite  Na 

{■p) 

the  equations  for  Q-^  and  are  identical,  where  V  in  the  first  case 

denotes  the  layer,  and  in  the  second  the  cycle  of  activity  in  the  A-system. 
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U)  effect  of  variation  in  (t,)  EFFECT  OF  VARIATION  IN 
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Figure  49(a)  shows  that  as  the  internal  inhibitory  component  is 
increased  to  the  point  where  the  terminal  steady-state  level  of  the  system  is 
below  the  value  of  for  the  initial  impulse  from  the  retina,  a  damped 

series  of  oscillations  occurs,  which  becomes  pronounced  as  is  increased, 

Changing  the  threshold  (as  in  Fig.  49(b))  also  serves  to  reduce  the  asymptotic 
activity  level,  but  does  not  cause  the  qualitative  alteration  from  a  monotonic  to 
an  oscillating  sequence, as  does  the  increase  in  .  A  sequence  which  is 

either  monotone  or  oscillating  for  one  value  of  will  remain  monotone  or 
oscillating  as  S  is  changed. 

18.3  •  Functions 


The  function  ?•  ;  for  a  binomial-model  cross -coupled  per - 

hi 

ceptron  can  be  calculated  by  an  extension  of  the  treatment  employed  in  the 
preceding  section.  The  resulting  equation  (again  assuming  large  )  is; 


^JJL  J 


(18.2) 


cv  - 

r- 

^  Ea 

-  I A  ^ 

E'i  + 

-4' 

- 

r  ^ 

ry:  •  -  L. 

J  ^ 

/; 

^  E^  ■ 

r 

Ei  ^ 

- 1:: 

-I,; 

The  above  notation  for  excitatory  and  inhibitory  signal  components  received 
from  the  "unique"  and  "common"  sets  of  sensory  points  and  A-units  active  at 
t  -  V  is  an  obvious  extension  of  the  notation  employed  previously  (c.f. , 
Chapter  6).  For  the  multinomial  probabilities,  we  have 
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where  ',  - 

proportion  of  S-points  activated  both  by  5'^  and  Sj  ^ 

,  1 

:  where  F-  ^  is  the  proportion  of  S-points  activated 

by  _  • 

EC 

F  - 

(j  p  Cy.  where  Fj  ^  is  the  proportion  of  S-points  activated 
by  ,  . 

'•  u 

r  ,. 

'^■'L 

'  >  -  /  i>-! 

J 

V-  -  C-  ; 

p-  - 1  'u  -  1  '■  yu  -  1 

Ua  - 

For  arbitrary  values  of  ul  and  7^  ,  Q-  /  can  again  be  calculated  by 

a  recursive  operation,  assuming  that  the  perceptron  is  "silent"  prior  to  the 
start  of  each  sequence.  If  the  two  sequences  Ji  and  Jj  are  incommen¬ 
surate  (or  ii  juL  4=  )  the  values  of  are  thus  taken  to  be  zero  up  to  the 

time  that  both  sequences  have  begun.  (This  is  equivalent  to  extending  the 
shorter  sequence  by  adding  a  sufficien  t  number  of  null  stimuli  at  the  beginning 
to  make  it  equal  in  length  to  the  longer  sequence.  ) 

Two  questions  are  of  particular  importance  concerning  these 
functions.  The  first  is  the  question  of  the  sensitivity  of  the  system  to 
pertubations  in  a  sequence  of  stimuli;  this  determines  how  well  a  cross - 
coupled  perceptron  can  discriminate  one  stimulus  sequence  from  another.  The 
second  question  is  the  dependence  of  the  present  state  of  the  system  upon 
stimuli  from  the  remote  past;  this  is  of  importance  in  order  to  guarantee  a 
sufficiently  consistent  response  to  a  present  stimulus  so  that  it  can  be 
correctly  identified,  and  also  in  justifying  an  approximation  to  the  percepcron's 
performance  by  means  of  an  analysis  of  finite  sequences  (as  will  be  done  in  the 
following  chapter).  Figures  50  and  51  present  the  results  of  an  investigation  of 
these  questions  . 

In  Figure  50  the  effect  of  a  perturbation  in  the  stimulus  sequence 
is  illustrated.  In  each  case  the  sequence  0/,  is  assumed  to  consist  of  17 
stimuli  (  A ^  ^ ^  ,  .  , .  ,  A  ly  ).  In  the  other  sequences,  one  or  more 
"perturbation  stimuli"  are  introduced  in  place  of  some  of  the  "A"  stimuli; 


The  data  for  these  illustrations  were  computed  by  W.  Eisner,  on  the 
Burroughs  220  computer  at  Cornell  University. 
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these  are  denoted  by  the  letter  "B"  in  the  figure.  In  figure  50(a),  a  single 

M  It 

"B"  stimulus  is  introduced,  in  place  of  the  eighth  A  stimulus,  with 
(the  intersection  between  the  "B"  stimulus  and  the  corresponding  "A" 
stimulus,  4  ,  )  being  zero.  We  find  that  with  0=2  ,  Q  is 

abruptly  reduced  as  soon  as  the  "B"  stimulus  occurs,  and  then  approaches  a 
new  asymptotic  level,  considerably  below  the  level.  With  a  threshold  of 

3,  however,  the  curve  following  the  perturbation  returns  to  the  level,  so 

that  three  or  four  stimuli  after  the  perturbation  it  is  impossible  to  tell  from  the 
active  A-set'that  the  perturbation  occurred.  If  the  location  of  the  "B"  stimulus 
in  the  sequence  is  changed,  the  same  type  of  Q curve  is  found,  with  the 
deflection  merely  being  displaced  in  time,  but  not  changed  in  magnitude. 

Figure  49(b)  shows  that  the  same  asymptotic  level  is  approached  regardless  of 
the  value  of  ,  as  long  as  the  "A"  and  ".B"  stimuli  are  not  identical 

(  C  <  -F  ).  In  general,  it  appears  that  the  asymptotic  value  of  depends 

on  the  parameters  of  the  network,  but  is  independent  of  the  magnitude  of  the 
perturbation. 

Figure  50(c)  shows  that  as  the  internal  inhibitory  component  is 
increased,  the  asymptotic  value  of  approaches  the  asymptotic  value  of 

,  in  much  the  same  manner  as  when  the  threshold  is  increased. 

Finally,  Figure  50(d)  illustrates  the  effect  of  increasing  the  duration  of  the 
perturbation  up  to  four  "B"  stimuli.  Note  that  the  return  curve  following  the 
perturbation  is  practically  identical  in  all  cases. 

Figure  51  demonstrates  the  effects  of  introducing  null  stimuli 
at  the  beginning  of  each  stimulus  sequence,  in  place  of  the  initial  "A" 
stimuli.  The  curves  obtained  are  very  similar  to  those  obtained  with  a 
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(a)  CURVES  FOR  0-2  (b)  CURVES  FOR  0=3  (c)  EFFECT  OF  VAR 


perturbation  of  the  cv/^  sequence,  and  it  is  again  found  that  by  increasing  the 
threshold  or  the  value  of  the  A-set  responding  to  the  altered  sequence 

can  be  made  to  approach  the  set  responding  to  the  original,  unaltered  sequence. 

These  results  demonstrate  that  there  are  two  distinct  conditions 
which  may  be  found  in  a  cross -coupled  perceptron,  depending  on  the  choice  of 
parameters.  With  small  6*  ,  or  small  values  of  ,  any  perturbation  Or 

variation  in  the  stimulus  sequence  will  cause  the  system  to  follow  a  unique 
course  for  all  subsequent  time,  and  the  A-set  which  is  active  at  time  t 
depends  on  the  entire  sequence  at  all  times  prior  to  t  ,  rather  than  on  the 
most  recent  stimuli.  By  increasing  .■!  or  ,  however,  such  a  perceptron 

can  be  converted  into  the  second  type,  in  which  only  the  most  recent  stim.uli 
appreciably  affect  the  current  state  of  the  A-system,  and  stimuli  which  are 
sufficiently  remote  in  tinre  have  a  negligible  effect.  By  lowering  9  or 
slightly,  the  duration  of  the  noticeable  aftereffects  of  a  sequence  perturbation 
can  be  increased,  while  still  permitting  an  ultimate  return  to  the  A-states 
associated  with  the  unperturbed  sequence  This  means,  in  effect,  that  the 
perceptron  has  a  "short  term  memory"  for  sequences  of  a  length  commensurate 
with  the  time  for  the  U-  ■  curve  to  return  to  its  "normal"  level,  and  such 
sequences  can  be  discriminated  by  the  system.  In  discriminating  such 
sequences,  the  most  recent  stimuli  will  tend  to  dominate,  and  differences 
which  occur  in  the  remote  past  will  be  harder  to  recognize.  With  the  first 
type  of  perceptron,  however,  which  is  obtained  abruptly  when  the  threshold 
becomes  low  enough  (or  becomes  low  enough)  even  the  most  remote 

stimuli  have  about  the  same  effect  as  the  most  recent  stimuli,  and  the 
current  A-state  gives  relatively  little  information  about  what  the  present 
stimuli  actually  are.  Thus,  in  order  to  guarantee  an  adequate  degree  of 
correlation  between  the  activity  state  and  the  current  stimuli,  it  is  necessary 
to  maintain  thresholds  or  inhibitory  components  at  a  sufficiently  high  level;  a 
perceptron  of  the  first  type  is  unlikely  to  be  of  much  practical  value. 


-119  - 


19.  ADAPTIVE  PROCESSES  IN  CLOSED-LOOP  CROSS -COUPLED  PERCEPTRONS 


In  Chapter  18,  cross-coupled  perceptrons  with  fixed  connection 
networks  were  analyzed  to  determine  their  stability  and  characteristic 
responses  to  sequences  of  stimuli.  In  earlier  chapters,  four-layer  and 
open-loop  cross -coupled  perceptrons  were  analyzed  to  show  that  an  adaptive 
preterminal  network  could  vastly  improve  the  capabilities  of  such  systems  for 
similarity  generalization.  We  now  turn  to  the  consideration  of  cross-coupled 
perceptrons  with  adaptive  interconnections  between  the  A-units,  and  will 
attempt  to  show  that  the  same  phenomena  can  be  found  here,  in  a  more  general 
and  more  efficient  form.  The  cross -coupled  system  not  only  recognizes 
sequences  of  stimuli  of  arbitrary  length,  but  tends  to  accellerate  its  adaptation 
process  due  to  positive  feedback  effects  within  the  system.  It  will  be  shown 
later  that  the  closed-loop  cross -coupled  system  is  equivalent  to  an  infinitely 
extended  open-loop  system,  analogous  to  the  one  described  in  Chapter  17. 

The  first  attempt  to  demonstrate  similarity  generalization  in 
cros s -coupled  systems  was  that  of  Rosenblatt,  in  Ref.  85.  This  was  a 
partially  analytic  and  partially  heuristic  argument,  based  upon  a  study  of  the 
similarities  of  origin-point  configurations  of  the  A-units  under  an  arbitrary 
transformation.  T.  While  the  general  predictions  in  this  paper  were  correct, 
and  have  subsequently  been  demonstrated  in  simulation  experiments,  the 
method  of  analysis  failed  to  yield  quantitative  predictions  of  the  terminal 
state  of  the  system,  after  a  prolonged  period  of  pre-conditioning.  The 
method  employed  here  is  basically  different,  and  yields  a  more  general,  as 
well  as  more  accurate,  result.  In  the  following  sections,  the  time -dependent 
evolution  equations  for  the  cross-coupled  system  will  first  be  developed  in 
their  most  general  form,  and  specific  applications  will  then  be  made  to 
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systems  in  which  the  assumptions  and  initial  conditions  are  simplified,  to 
permit  a  more  complete  analysis,  In  the  final  sections,  several  similarity 
generalization  experiments  will  be  presented,  and  performance  will  be 
compared  with  that  of  multi-layer  perceptrons. 

19.1  Postulated  Organization  and  Dynamics 

The  perceptrons  to  be  analyzed  in  this  chapter  will  be  assumed,  for 
convenience,  to  be  fully  cross-coupled,  that  is,  there  is  a  connection  from 
every  A-unit  to  every  other  A-unit  and  to  itself  as  well.  It  can  be  shown  that 
the  conclusions  which  we  shall  reach  for  such  a  system  can  be  extended  to  any 
perceptron  for  which  the  number  of  cross-coupling  connections  per  A-unit  is 
large,  and  the  termini  of  the  connections  are  assigned  at  random. 

Connections  from  S  to  A-units  are  assumed  to  be  fixed  in  value,  and 
connections  from  A  to  R-units  are  modifiable  according  to  any  of  the  usual 
reinforcement  rules.  (We  shall  not  be  concerned  here  with  the  reinforcement 
of  A-R  connections,  but  shall  concentrate  upon  the  evolution  of  the  association 
network  itself.)  The  A-units  are  assumed  to  be  simple,  with  threshold  Q  , 
and  output  signals  7.'  -  I  or  ^  .  The  transmission  time  for  all  connections 
is  a  constant  7'  .  Stimuli  are  assumed  to  occur  at  intervals  of  the  transmissiom 
time,  7 


Interconnections  among  A-units  are  assumed  to  be  variable, 
according  to  the  same  rule  employed  for  the  four-layer  system  of  Chapter  16;. 
namely,  if  /■  is  active  at  time  t  ,  and  a;  is  active  at  time  t  -h  T  ,  the 
value  of  the  connection  Zij  is  increased  by  a  quantity  and  at  the  same 

time,  all  values  j  decay  by  the  quantity  cf  At  (n/-j)  .  The  time  unit.  At 

will  generally  be  considered  large  relative  to  T  .In  symbols,  we  have 
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(f'lr-j)  At  i£  a-( t-r)  a] (t)  ^  i 
-d'At('iy-j)  otherwise  (19.1) 

thus  the  total  signal,  ,  received  by  the  A-unit  o.-  at  time  t  consists 

of  a  fixed-connection  component,  ,  originating  from  the  retina,  and  a 

variable  component,  J'-  t)  ,  comin.g  from  those  A-units  which  were  active  at 

r  -  r  . 

19.2  The  Phase  Space  of  the  A-units 


A^,;;  (t)  = 


Let  us  suppose  that  the  environment  of  a  cros s -coupled  perceptron 

consists  of  exactly  //  admissible  stimulus  sequences.  In  order  to  obtain  a 

G-matrix  for  this  perceptron,  and  predict  its  performance,  it  is  necessary  to 

know  how  its  A-units  will  respond  to  each  of  the  admissible  sequences,  inclu- 

t  h 

ding  the  response  to  the  1st,  2nd,  .  .  .  .  ,  m  member  of  the  sequence.  We 

t-  . 

will  use  the  notation  o {5-  )  to  denote  the  output  signal  of  the  unit  a; 
following  the  stimulus  of  the  sequence  J ■  .  If  the  sequence  ■  begins 

at  t-  ;’7  ,  the  stimulus  ^  will  occur  at  t  -1  ,  and  the  input  to  the  unit 
(2;  at  time  t  is  given  by 


(j  :'•)  v' 1 


(19.2) 


where  /if- 


is  the  sum  of  the  signals  received  from  the  retina  following 
the  occurrence  of  9-^  and  ’  (t)  1®  sum  of  the  signals  received 

from  other  A-units  at  time  t  ,  given  that  J ■  began  at  t  -7’r-  Knowing 
(y  ,  we  can  readily  determine  (j-  )  ,  since 


o 


•V:-  ) 

I  J  -L)  ' 


i  \i  OC.  0 

0  otherwise 


L 
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(J ' 

In  the  perceptrons  to  be  considered  here,  /i-  ^  is  a  constant,  while 

(  '  ' 

is  a  time -dependent  variable  (as  in  the  four -layer  perceptrons 
of  Chapter  16). 

It  will  be  convenient  to  ■'•'^present  each  abbreviated  sequence 
consisting  of  the  first  members  of  any  of  the  original  n  sequences  by 
a  full  sequence  of  length  i’  .  If  m  is  the  maximum  sequence  length,  this 
results  in  a  set  of  at  most  rnn  sequences.  Let  //  be  the  number  of  such 
sequences,  and  let  them  be  numbered  Irom-J/  through  .  Then  in 

terms  of  these  new  sequences,  We  can  obtain  all  of  the  (<^^l  i 

where  is  the  sequence  corresponding  to  the  first  members  of  the 
original  sequence  J-  .  The  notation  means  the  signal  from  ai 

following  the  last  member  of  sequence  ,  Similarly,  we  have 

J  •  -r  ^  . 

I  L 

All  of  the  information  necessary  to  predict  the  response  of  an 
A-unil  /'■  at  time  '  can  now  be  obtained  from  the  .'N  numbers 

.  /  •  /•  ■  .  '•  .  Thus  the 

set  of  all  possible  signals  (divided  into  retinal  and  internal  components) 
which  might  affect  the  activity  of  u ;  at  time  t  ,  can  be  represented 
by  a  vector  of  /'N  components,  which  depends  on  t  ,  The  space  of  all 
such  vectors  can  be  mapped  into  a  Euclidean  /  -space,  wdiere  each 
point  represents  a  possible  A -unit,  or  set  of  A -units,  of  the  perceptron. 

This  will  be  called  the  phase  space  of  the  A -units.  For  a  large,  or  infinite 
perceptron,  there  is  likely  to  be  some  concentration  of  A-units  at  each 
point  in  this  phase  space  at  time  ..  .  Thus,  at  time  ,  there  is  a 

probability  density  associated  with  each  point  in  the  phase  space.  The  state  of 
the  entire  association  system  at  a  given  time,  '  ,  can  then  be  represented  by 
a  probability  density  distribution  over  the  phase  space  of  the  A-units. 
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For  convenience  of  notation,  parentheses  for  superscripts  of 
(X  ,  /3  ,  and  T  components  will  hereafter  be  omitted,  with  the  understanding 
that  the  symbol  /i^.  means  the  -component  for  unit  a;  from  stimulus 
sequence  J f,  .  If  exponents  are  required,  they  will  be  expressed  by  the 
notation  (/5.^  )  ,  which  would  be  to  the  power.  It  should  be 

remembered  that  with  the  symbols  x  ,  /3  ,  and  'f  ,  subscripts  always 
denote  A-units,  whereas  superscripts  indicate  stimulus  sequences. 

19.3  The  Assumption  of  Finite  Sequences 

In  analyzing  the  performance  of  a  perceptron,  it  will  generally 
be  our  objective  to  predict  the  condition  of  the  association  system  in  the  limit, 
as  the  length  of  the  preconditioning  sequence  becomes  infinite.  This  means 
that  there  are  generally  an  infinite  number  of  possible  sequences  in  the 
environment,  and  the  phase  space  of  the  A-units  is  properly  represented  by 
an  infinite  dimensional  Euclidean  space.  To  Justify  later  assumptions,  how¬ 
ever,  it  is  necessary  to  assume  that  the  preconditioning  sequence  is  actually 
composed  of  a  mixture  of  a  finite  number  of  subsequences  of  finite  length. 

While  this  assumption  will  be  carried  through  the  analysis  of  the  following 
section,  it  will  be  shown  later  that  it  is  possible  to  drop  the  assumption  in 
the  case  of  periodic  preconditioning  sequences. 

Justification  for  an  assumption  of  finite  sequences  can  be  found 
in  one  of  two  ways.  First,  v/e  may  assume  that  only  the  m  stimuli  prior  to 
time  r  can  have  any  appreciable  effect  on  the  activity  state  of  the  A-system 
at  time  t  In  this  case,  we  need  consider  only  sequences  of  length  />;  as 
possible  determinants  of  ■  (t)  .  Note  that  this  assumption  applies  only  to 
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the  activity  state  of  the  system,  and  not  to  the  values  of  the  connections  or 
memory  state  of  the  network,  which  clearly  depends  on  all  prior  time.  Such, 
an  assumption  appears  to  be  supported  by  the  analyses  of  the  last  chapter, 
which  show  that  for  suitable  parameters,  only  the  most  recent  stimuli  affect 
the  activity  state  of  the  system  at  time  t  ,  progressively  more  remote 
stimuli  making  a  progressively  smaller  contribution,  which  soon  becomes 
negligible.  Specifically,  it  has  been  shown  that  with  suitable  parameters,  it 
makes  no  significant  difference  to  assume  that  the  sequence  began  at  time 
t-  ,  rather  than  at  some  earlier  time,  which  is  equivalent  to  the  assumption 
of  a  finite  universe  of  sequences  of  length  rr,  ,  in  place  of  the  universe  of 
infinite  sequences. 

An  alternative  approach,  for  which  a  rigorous  analysis  rather  than 
a  mere  approximation  is  possible,  is  the  following:  Assume  that  the  activity 
of  the  A-units  is  "quenched"  after  every  />/  stimuli;  i.e.,  the  perceptron  is 
shown  only  sequences  of  length  ,  and  at  the  end  of  each  such  sequence,  its 
activity  is  interrupted  by  setting  all  *  (')  ,  so  that  the  next  sequence  begins 

with  the  perceptron  in  a  "silent"  state,  as  required.  Let  us  analyze  the 
performance  of  such  a  perceptron  (for  which  the  dimension  of  the  phase  space 
is  finite)  and  then  let  n  approach  infinity.  The  limiting  behavior  of  such  a 
system  should  correspond  to  a  perceptron  in  which  the  sequences  are  uninter¬ 
rupted.  For  specificity,  and  to  permit  a  rigorous  analysis,  this  type  of 
inter rupted -at  tivii  y  system  will  be  assumed  in  the  following  analysis,  although 
it  will  be  shown  later  that  the  results  can  be  e.xtended  to  a  more  general  case. 

In  keeping  with  the  above  assumption,  it  will  be  assumed  that 
there  are  a  total  of  A/  possible  subsequences  which  comprise  the  precondi¬ 
tioning  sequence  of  the  perceptron,  symbolized  ,  .  .  .  ,  .  The 

phase  space  therefore  has  dimension  2\  ,  and  it  is  assumed  that  no  stimulus 
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sequence  (i.e,,  no  subsequence)  has  more  than  m  members  (where  m  is 
finite).  By  selecting  both  and  cT  sufficiently  small,  it  can  be  guaranteed 
that  the  change  in  the  memory  state  of  the  perceptron  during  a  single  sequence 
of  length  m  is  negligible,  or  infinitesimal,  so  that  the  output  signal  af  (J^) 
depends  only  on  and  the  memory  state  of  the  system  at  the  start  of  the 

sequence,  and  does  not  depend  on  changes  in  the  memory  state  which  occurred 
during  the  sequence  itself. 

19.4  General  Analysis::  The  Time -Dependent  Equation 

Given  the  probability  density  over  the  phase  space  of  the  A-units 
at  time  t  ,  it  is  possible  to  obtain  the  Q-functions  -  Q-j  for  any 

pair  of  sequences  (of  length  yx  and  ,  respectively)  by  integrating  the 
probability  density  over  the  region  of  phase  space  for  which  o  (^j)  -  /  • 

That  is,  we  integrate  over  the  region  for  which  <x'  h  P  and  (x'"  ^  9 
The  subscript  denoting  particular  A-units  is  suppressed  here,  since  we  are 
concerned  only  with  the  density  of  such  A-units,  and  not  with  their  individual 
identity. 


The  object  of  a  general  analysis  of  the  evolution  of  the  association 
system  in  such  a  perceptron  is  to  describe  the  "flow"  of  A-units  in  this 
phase  space,  so  as  to  obtain  the  density  function  at  time  t  as  a  function  of 
the  initial  distribution  and  the  stimulus  sequences  to  which  the  perceptron 
has  been  exposed.  The  system  can  be  represented  by  a  sort  of  hydrodynamic 
model;  the  probability  density  in  the  phase  space  is  treated  as  a  sort  of 
compressible  fluid,  in  which  convection  phenomena  occur,  but  in  which 
there  is  no  diffusion,  since  it  will  be  seen  that  the  A-units  which  initially 
occupy  a  given  point  in  phase  space  will  always  move  together,  in  unison. 
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rather  than  following  unique  paths.  Throughout  this  analysis,  it  will  be  assumed 
that  we  are  dealing  with  finite  stimulus  sequences  (as  described  in  Section  1 9 . 3 ), 
and  that  the  rate  of  flow  (the  length  of  the  velocity  vector)  for  all  points  in 
the  phase  space  is  infinitesimal  over  the  duration  of  the  longest  sequence. 

The  history  of  the  perceptron,  then,  consists  of  an  endless  sequence  of  such 
finite  sub-sequences,  so  that  at  a  given  point  in  time,  the  perceptron  can  be 
assumed  to  be  exposed  to  a  mixture  of  all  possible  sequences,  each  weighted 
according  to  its  probability.  The  velocity  vector  for  a  given  point  in  phase- 
space  at  time  t  then  depends  on  the  combination  of  velocity  components 
contributed  by  each  of  the  stimulus  sequences  to  which  the  perceptron  is 
exposed , 


We  have  seen  that  each  A-unit,  .t-  ,  is  characterized  by  a  set 

.  ,  2 
of  coordinates  in  phase  space  at  time  r  ,  namely  ( /5  •  ,  /i;  ,  .  /j;  , 

V 

/ '•  .  ).  For  the  given  A-unit,  the  -components  are  fixed  for  all 

time,  while  the  ^  -components  depend  on  t  .  Thus,  to  follow  the  history 
of  this  A-unit  (or  point  in  pliase  space)  we  mas't  determine  the  velocity 
vector  I'i  ■  ,  /'•  ,  .  .  .  ,  j  as  a  function  of  time  for  the 

point  (■  ,  ,/;•  ). 

We  consider  first  the  effect  of  the  reinforcement  which  occurs 
for  the  last  stimulus  in  a  sequence  upon  the  component  ' • '  .  To  be 

specific,  suppose  sequence  ^  occurs  at  time  ^  ,  and  occurs  at 

/,  /  Zi  4  ,  and  assume  the  transmission  time  7'  <<  At  .  Then  the 

(infinitesimal)  change  in  A.'  due  to  having  reinforced  the  last  stimulus  in 
sequence  i./,  at  time  *  will  be  denoted  by  ,  ,r,-  (■?  ))  .  It  is  a 

function  of  the  location  of  the  point  in  phase  space  whose  motion  is  being 
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traced,  at  time  t  .  Note  that  although  only  the  effect  due  to  the  last 
stimulus  of  the  sequence  is  considered,  all  abbreviated  sequences  are 
present  among  the  /V  possible  sequences,  so  that  if  we  know  the  effect  of 
reinforcing  the  terminal  stimulus  in  each  case,  the  effect  of  all  possible 
reinforcements  can  be  calculated. 


A  notation  for  the  sequence  corresponding  to  ij].  with  its 
terminal  member  omitted  (i.e,,  the  sequence  abbreviated  by  one 

stimulus)  will  be  required.  We  shall  use  the  symbol  to  denote  such 

an  abbreviated  sequence.  The  change  in  the  memory  state  due  to  the  last 
stimulus  of  sequence  J  is  then  attributable  to  the  modification  of  the  values 

f 

of  those  connections  which  originate  in  the  set  of  A.-units  which  respond  to 
<,J  '  and  which  terminate  in  the  set  of  A -units  responding  to  .  From 

equation  (19.  1)  we  see  that  each  such  connection  gains  a  quantity  of  value 
>/  -  fi'-'.'  /.t  ,  while  all  other  connections  lose  a  quantity 


Figure  52  illustrates  the  relationship  of  the  A-unit  sets  which 

are  involved  in  this  transaction,  and  shows  the  increments  to  ^  which 

result  from  the  occurrence  of  cv''^  at  time  t  .  The  sets  responding  at 

time  t  and  /  -  Z'  are  designated  •'  .  (t)  and  / 

/ 

set  ■■■,.'  '■  /■  ■  '  is  the  set  responding  to  the  prcterminal  stimulus  of 

sequence  J .  The  measures  of  these  sets  are  and  ^ 

Since  it  was  assumed  that  all  A -units  are  interconnected,  the  measure  of 
the  set  of  connections  for  which  A’-  -  T''-' A  f  is  0^i(tj  for 

0-1  ^  q,  >  3.nd  the  measure  of  the  set  of  connections  for  which 

A'ly  -rf'-i/At  is  /-  .  If  cj  ■  ^  ifs  input  connections 

lose  -(J^-At  .  But  we  are  particularly  interested  in  the  change  in  /I  S 


'fj  ,  respectively.  The 
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A-SETS  RESPONDING  TO  A-SETS  RESPONDING  TO 

ABBREVIATED  SEQUENCES  FULL  SEQUENCES 


Figure  52  EFFECT  OF  REINFORCING  SEQUENCE  UPON 

7 


which  is  the  sum  of  the  changes  of  value  for  all  connections  originating  in 
the  set  /'  f  -  .’rJ  ,  and  terminating  on  the  arbitrary  unit  a:  ,  whose 

coordinates  are  ■  j  .  These  connections  can  be  divided  into 

three  subsets: 

(1)  Connections  which  originate  from  the  intersection 
t  i- lJ.  !  and  terminate  in  change  by  i\t 
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(Z)  Connections  which  originate  from  the  set 
Ai.i{t  +  At)  -  n  and  terminate  in  A^(t)  changeby 

-  (f'lr  At 

(3)  All  connections  which  originate  from  the  set  Af.'(t  +  At) 
and  terminate  outside  of  A^(t)  change  by  -  At 

Now  let  us  consider  the  difference  equation 

-  r;"' ^  At)  -  r-' it)  (19.3) 

for  the  A-unit  a-  whose  location  at  time  t  is  ( /:5;  ,  'fi  it))  .  Since 
'd'-'  -  /  !/-■;  ,  we  can  make  the  substitutions: 

t  'J  ‘ 

a  ■ 

J  ^ 

AA'i/^Ar)-  21  ■'jrt.At)  -  21  '::i<)^  IL  Axr-- 

^  21  ^,/t^At)  -  21  -^Jlit+At) 

c/;r[Ar\ttAt)-Ar'itl}  Xji\A^'(t}  -  A  ^'(tf  At)} 

Making  these  substitutions  yields: 

.At)-  21  ^  i'^  ^  21  -  ‘  At)  (19.4) 

where  AA^.'  =  }A^'(tiAt)  -Af.'(t)ji-^A^'(t)-A/{t  +  At)j  ,  that  is,  the 
set  of  A-units  added  or  subtracted  from  the  set  Af.'(t)  during  the  period 
At  .  The  first  sum  represents  the  change  of  value  of  the  set  of  connections 
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which  originate  in  and  are  reinforced  at  time  t  due  to  sequence 

eiC,  .  This  change  in  value  is  readily  obtained  from  the  components  listed 
above,  and  is  given  by 


r 


y .  I 


aj 


.N, 


0-;€Af.'(t) 


-d'At  2^ 


At  for  aieA^it) 


for  o-$Ajt) 


which  may  be  combined  in  the  form 


0.;6A^^'t) 


'v  I  '  '■ 


At 


(19.5) 


where,  as  before,  =  /  for  (X  ^  f'  ,  and  '  otherwise,  and  Ti  (t) 


has  been  substituted  for  z. 


The  second  sum  in  (19.4)  represents  the  value  of  the  set  of 
connections  which  originate  from  the  incremental  set,  i\A^.i  .  For  this 
sum,  it  will  be  convenient  to  substitute  the  symbol  /A •  Thus, 
(19.4)  becomes 


'c/  r'- 


1 


,  'i 


.t  +  A. 


(19.6) 


where  the  subscript  '  indicates  that  the  subscripted  variable  is  a  component 
of  the  vector  '  'i.  ■'’/  for  the  unit  r/ .  . 

■  '  '  I 
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which  originate  in  and  are  reinforced  at  time  t  due  to  sequence 

.  This  change  in  value  is  readily  obtained  from  the  components  listed 
above,  and  is  given  by 


r 


\Nc’1 

^  ajeA^'it) 


-  3  A  t  ^  (t) 


At  for  a;6A^{t) 


for  ai$AJt) 


which  may  be  combined  in  the  form 


0.j€A^i[  t] 

where,  as  before,  -i' '’v,  =  /  for  (X  ^  ,  and  otherwise,  and  /'■' (t) 

has  been  substituted  for  ''Vi  ^ 

The  second  sum  in  (19.4)  represents  the  value  of  the  set  of 
connections  which  originate  from  the  incremental  set,  AA^i  .  For  this 
sum,  it  will  be  convenient  to  substitute  the  symbol  t'  ■  Thus, 

(19.4)  become s 


At 


(19.5) 


t"  "h 


(19.6) 


where  the  subscript  ^  indicates  that  the  subscripted  variable  is  a  component 
of  the  vector  ^  :i,  for  the  unit  <>•  . 

’  '  i 
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Now  suppose  each  possible  "conditioning  sequence",  <J  , 

r 

occurs  with  a  probability  ,  and  that  a  statistically  uniform  mixture  of 
all  such  sequences  occurs  at  time  t  .  This  supposition  is  justified  by 
our  assumption  that  the  length  of  each  sequence  is  infinitesimal,  relative 
to  the  rate  of  change  in  the  memory-state  of  the  perceptron.  In  that  case, 
we  obtain  from  (19.6) 


7/3,- ,  it))  =  2L  ^  J  >  7  ^t)) 

■? 


(19.7) 


i  Z  % 


-  (fAr 


* 

where  A\  'f;  '  ■tj  -  value  added  or  subtracted  due  to  connections  originating 

from  the  combined  incremental  set  due  to  all  .  If  we  now  divide  both 

/ 

sides  by  At  and  allow  At  to  approach  zero,  we  obtain  the  differential 
equation  for  the  velocity  component  .f  ' '  t '  for  the  unit  a;  ; 


where 


U: 


d  I 


•J 


r , 


Li 


'  1  ni 

At.  ►  -j 


i) 


(19.8) 


Note  that  the  quantity  d'  t'  is  zero  except  at  those  times 
that  new  A-units  are  added  to  the  set  d,,-  i.  I  ,  since  it  represents  the  sum 
of  the  values  in  the  incremental  set  AA^'  .  Again,  we  note  that  for 
sequences  of  length  Z  or  less,  the  set  A^'(t)  never  changes,  since  new 
units  can  be  added  to  the  set  only  if  (P(<y  '~  )  changes  from  0  to  1 ,  and  for 


Strictly  speaking,  this  is  either  zero,  or  fails  to  exist.  However,  this 
expression  will  be  restated  below  in  terms  of  delta-functions  (see 
Equation  19.9). 
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sequences  of  length  2,  (p  {(y.''  )  ~  ,  which  is  constant.  Similarly, 

for  sequences  of  length  2  or  less,  is  constant.  Consequently,  for 

these  conditions,  the  equation  (19.8)  is  equivalent  to  (16.11),  except  that 

dt 

is  not  always  zero;  at  those  times  that  new  A-units  are  added  to  the  set 

<  ,  an  unknown  increment  to  the  value  of  d'd  occurs,  which  depends  upon 

the  values  of  the  connections  from  those  units  whose  ry  ^  has  just  become 
equal  to  'V  This  quantity  is  exceedingly  difficult  to  calculate,  as  it  depends 
upon  detailed  correlation  of  the  ^  -vectors  for  the  new  transmitting  units  and 
the  -vector  for  the  receiving  unit,  a:  Fortunately,  it  can  be  shown  that 

the  steady-state  solution  to  (19.8)  does  not  depend  upon  the  actual  value  of  the 
last  term,  even  though  it  affects  the  rate  of  convergence  to  the  steady-state 
condition . 


takes  the  place  of  the  general  case,  however,  d  7'-^  (t)  /  • 


In  the  general  case,  the  solution  of  (19.8)  is  discontinuous,  unlike 

the  solution  of  (16.11),  which  was  always  continuous  despite  its  discontinuous 

r, 

derivative.  From  the  above  discussion  as  to  the  nature  of  •  <.tj  ,  it 

becomes  clear  that  (19-8)  can  be  rewritten  in  terms  of  Dirac  delta -functions : 


dd: 

i.t- 


y 


where  t is  any  ''ime  at  which  one  or  more  of  the  0  {<y.  :  )  changes  from  0 
to  1  or  vice  versa . 
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19.5  Steady  State  Solutions 


Consider  the  equilibrium  equation  corresponding  to  (19.9).  If 
an  equilibrium  exists  at  time  t  ,  then  no  0  (a)  can  change  its  value  at 
time  t  ,  and  thus  the  last  term  of  (19-9)  is  zero  at  this  time.  Thus,  a 
steady  state  solution  must  correspond  to  a  solution  to  the  equation 


dt 


z 


p. 


?  N''*' 


(t) 


^  '  'I 


r/^rool)  -  rj  r/'(m)  =  n 


(19.10) 


which  gives 


r;''(cc) 


/ 


(19.11) 


or,  substituting  for 


Note  that  the  terminal  vector 
depends  only  on  the  starting  vector 
place  of  (19.12), 


j  ^ 

of  an  A-unit  (in  a  given  system) 
( /3 ,  so  that  we  can  also  write  in 


/f;  ''  rX), 


I'cL  >(’ 


z 

9 


■5( 


P-’i 


<  h.?;) 


y-  ()  (coj) 


(19.13) 


where  /"'/j,  is  the  probability  that  an  A-unit  is  initially  situated  at  the 
point  (/J,  'fg)  in  the  phase  space.  Thus,  in  this  form,  the  steady-state.  _  ^ 
solution  requires  no  knowledge  of  the  individual  A-units  and  their  connections. 
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but  depends  only  on  the  initial  point-mass  distribution  over  the  phase  space. 
The  corresponding  time -dependent  differential  equation  represents  the 
velocity  vector  for  an  element  of  probability-mass  in  this  phase  space. 


Now  a  possible  solution  of  (19.  13)  can  be  found  by  the  following 

iterative  procedure:  Assume  that  initially,  the  values  of  all  A-A  connections 

are  zero,  so  that  =  0  for  all  units,  and  (19.  13)  depends  only  on  the 

/3  -vectors.  Begin  by  inserting  Tr  ^  on  the  right-hand  side 

of  (19.13);  and  compute  the  resulting  approximation  for  /)■  ( /6 ,  >  ^or 

all  possible  -vectors  (or  for  all  units,  u;  ).  The  first  approximation  for 

is  then  inserted  on  the  right-hand  side,  to  obtain  the  next  approximation, 

th 

etc.  If  we  let  7).^,  represent  the  result  of  the  j  iteration,  we  have 


I) 


f;h<)L 


'■  / 


(19. 14) 


We  will  now  attempt  to  show  that  this  iteration  must  converge  in  a  finite 
number  of  steps  to  the  solution  of  the  differential  equation  (19.9),  for 
equivalent  initial  conditions. 


We  first  show  that  the  iteration  process  itself  converges  in  less 
than  /v  steps  (where  N  -  the  number  of  stimulus  sequences,  and  ■ 
the  number  of  '  -vectors  for  which  ^  /•  '  >  0  )  .  On  the  first  iteration  ,  it 
is  clear  that  the  's  can  only  increase,  since  they  start  out  from  zero,  and 
are  set  equal  to  a  non-negative  quantity.  But  introducing  this  quantity  for  the 
next  iteration  can  only  increase  the  O  's  from  zero  to  1 ;  it  cannot  cause  any 
to  decrease.  Consequently,  on  the  next  iteration,  the  ^  's  can  again  only 


increase,  and  similarly  for  each  subsequent  iteration.  Since  ''f  is  non- 
decreasing,  +  '/  )  is  non -decreasing ,  for  all  r  .  But  can 

change  only  when  some  0  changes,  and  each  0  can  change  at  most  once 
(from  0  to  1).  But  there  are  at  most  0 -functions,  )  .  H  all  of 

these  are  initially  zero,  the  system  is  already  at  a  solution,  and  no  further 
changes  will  occur.  Therefore,  at  most  n  <  Ni^M  0-functions  can  change, 
and  the  process  must  converge  in  less  than  W  A/  iterations. 

r 

r" 

Let  the  end  result  of  this  process  be  'f.  for  any  unit  Ci^ 
now  wish  to  prove  that  f.  is  a  solution  of  the  differential  equation  (19-9 

r ' 

To  begin  with,  we  prove  that  •/'  is  a  minimal  solution  of  the 
equilibrium  equation  (19.13). 


Let  '/T.''  be  any  solution  of  the  equilibrium  equation.  Then  for 


the  iteration  process,  we  have  ^^■('0^  -  f-  for  all  r  and  all 

Since  the  right-hand  side  of  (19.13)  is  a  monotone  non-decreasing  function 

f 

of  ,  we  have 


(1) 


"  Z  O  Z  r) 


a- 


We 
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r*  ^  At* 

Similarly,  ■/  ir  /  ,  and  hei^cft  6:  y.  .  Hence  /. 

i(n)  i  i  i  >■ 


IS 


minimal 


Now  consider  the  differential  equation,  (19.9)’.  As  long  as  no 
changes  value,  all  o  functions  are  zero,  and  (19.9)  simplifies  to 


dy! 
_ ±. 

d.t 


f  '1; 

where  o(.'- it)  -  £.  -f  Y.  it)  ■  Thus,  while  the  6  's  are  constant,  the 


differential  equation  is  of  the  form 


d.  Y 
d  t 


~  M  -  S  Y  ’  where 


M  =  N 


Z  k  Z  P(/3. )  (p(ocJ^  ) 


Thus,  during  this  time,  there  is  an  exponential  approach  to  the  limit  Myo  > 
analogous  to  the  solution  discussed  in  Chapter  16  (pg .  355  ).  Now  suppose 

n 

at  time  one  of  the  cp  '«  changes.  At  this  point,  the  last  term  in 

(19.9)  is  infinite,  and  the  solution  is  discontinuous,  since  the  value  of  the 
connections  from  the  incremental  set  AA„'  has  just  been  added  to  Y- 

*  I 

Consequently,  the  solution  takes  the  form  shown  in  Figure  53. 
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Figure  53  FORM  OF  SOLUTION  FOR  CROSS-COUPLED  cv:-SYSTEMS 


where 


.  5  3 


The  middle  term  of  this  expression  represents  the  value  of  at  time 

-  dt  ,  just  prior  to  the  discontinuity.  The  magnitude  of 
remains  unknown,  but  we  know  that  it  must  be  non -negative,  since  it 
consists  of  values  of  A-unit  interconnections  which  began  at  zero  and  can 
only  have  changed  in  a  positive  direction.  As  in  the  case  of  the  iterative 
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process,  there  are  at  most  N^N  times,  t,  ,  =it  which  these  discontinuities 

M  M 

can  occur,  and  each  new  limit  ^  .  Moreover,  the  solution  remains 

monotone  increasing,  despite  its  discontinuities.  This  last  conclusion  can  be 
seen  from  the  fact  that  the  increment  A  i ^  comes  from  the  values  of  a  set 

of  connections  whose  origins  are  now  active  for  one  m.ore  stimulus  sequence 
than  previously.  Since  no  previously  active  A-units  have  become  inactive  {all 
^  's  being  monotone  increasing)  the  values  of  these  connections  will  not 
diminish,  and  will,  in  tact,  tend  to  increase.  Thus  the  new  limit  for  '/■  can 
be  no  lower  than  its  present  value. 


New  consider  the  first  step  of  the  iterative  process.  This  yields 

for  the  value  of  the  first  asumptotic  level,  /(5'  ,  for  all  in  the 

differential  equation.  This  means  that  if  any  ^  changes  in  the  differential 

equation  prior  to  reaclti.;g  the  level  Mq^'^6'  »  '^Itis  (f)  must  also  change  in  the 

first  step  of  the  iterative  process.  (If  no  <f)  changes  prior  to  the  level 

then  no  (})  will  ever  change,  and  we  are  at  a  solution  for  both  equations').  But 

the  new  level,  M^/d  >  ^  positive  monotonic  function  of  the  (j>  's,  and  the 

'  r 

next  step  of  the  iteration  process,  ,  corresponds  to  the  level 

which  would  have  resulted  had  every  /  actually  attained  its  asymptotic  level 

fAo/s  ■  Y.  >  '/.  it,)  for  every  r  .  But  from  the  same  argu- 

ment,  it  follows  that  ^  Y-  (t ^  )  ,  and  in  general,  ^  Y.  ,) 

r*  r 

Consequently,  /.  -  ,  and  the  solutions  of  the  two  equations  are 

indeed  identical  . 


=!=  It  is  assumed  that  M  is  not  identically  equal  to  fc)  ,  in  which  case  the 
solutions  might  coincide  only  for  t  -  oo 
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19.6  Analysis  of  Finite -Sec  i^ricc  .  v-.^ii-onmenlL. 


The  term  "finite-scquoi'ce  envi..  'r.niem "  v.’ll  !5e  I’.sed  for  any 
system  in  which  the  stream  of  activii.y  iri  periodicalij.  interr-m* ed,  fT-tlier  by 
actively  setting  all  CL.  to  zero,  or  by  '.itroducing  sequeni_f's  of  ii'd)  of 

sufficient  duration  to  allow  all  A-unit  activity  to  die  out  of  its  o  vn  d. .  The  ia 

ter  ■  possibility  exists  only  for  systems  in  wl-.ich  tlie  internal  connev.‘'ion  va'ves 
are  sufficiently  small,  or  contain  a  sufficiciii  i  ilKbitory  component,  to  gua  r  rvut'^t. 
that  activy  will,  in  fact,  die  away.  Some  idea  of  !.!  f.  conditions  for  this  to  occu  • 
may  be  gained  from  Section  18.2,  and  Figure  47.  'For  convenience  (and  because 
it  can  always  be  realized,  regardless  of  choice  ol  pa  rainetf  i-s )  the  interrupted 
activity  model  will  be  considered  here.  Jn  either  ca.e,  !j.nite  -  sequence 
environments  are  directly  analyzeable  by  the  method  c  Section  19.5.  Several 
examples  are  given  here,  based  on  the  same  stimulus  environment  as  in 
Experiment  12.  It  will  be  recalled  that  this  consisted  of  four  stimuli,  with 
areas  =  ,  2  >  and  intersections  C  and  >  all  other  intersections 

being  zero.  As  in  the  example  in  Chapter  17,  we  will  consider  a  binomial 
perceptron  with  parameters  'jC  -  5  ,  y  =  0  ,  and  d  =  Z  >  for  all  A-units  . 


EXAMPLE  1:  Suppose  the  preconditioning  sequence  consists  of  an  endless 

repetition  of  the  subsequence:  SS  5  5  j  S  5  5  5.  /  5.  S  S  5  /. . .  .  ,  where 
the  symbol  /  is  used  to  indicate  points  at  which  activity  is  interrupted.  Then 
for  this  environment  there  are  actually  four  possible  sequences  to  be  considered 
in  the  Ei.i.aly sis ,  namely 

T  FS,) 

TS,5J 

=(5,S,S,) 

Z  =(s,s,s,s,) 
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\ 


occurring  with  probability  P  ^.25  ■  The  ^  -vectors  for  these  four 

' 

correspond  to  the  signals  received  from  the  terminal  stimulus  in 
and  are  listed  in  Table  9,  together  with  their  probabilities. 

P  •  e-to;.‘o  -listing  oniy  of  I's  and  O's  represent  A. -units  which  will  always 

. . . .  w 

The  initial  W--'?'^trix  for  this  experiment  is  precisely  the  same 
as  that  found  for  the  corresponding^  terminal  stimuli  in  Chapter  17,  namely, 


104  .034  ..000 

\  .  000  .  loV.  .  000  .  034 
.  034  .  000  ^^4  .  000 
\  .  000  .034  .  0(^-V  104 


V 

n-i  /  ~ 


U  is  iound  that  no  change  occurs  in  this  matrix  for  ^  ■  T'eV'i».:;_5 

Mior- .'’.’re  consider  the  case  in  wiiich  *^he  open-loop  system 

of  c^hapte;-  '7,  the  sequence  of  Experiment  12  yielded  the  terminal  Q-matrix: 


210 

.  176 

.  034 

.  072 

176 

.  210 

.  072 

.  034 

034 

.  072 

.  210 

.  176 

072 

.  034 

,  176 

.  210 

If  we  now  compute  the  terminal  matrix  foi  a  faiiv  cres s -coupled  system,  from 
Equation  (19-14),  we  obtain: 


>9 


104 

.  000 

.  034 

.06  0  \ 

000 

.  152 

.  000 

.130  1 

034 

.  000 

.  104 

.000  j 

000 

.  130 

.  000 

.152/ 
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TABLE  9 

/5-VECTORS  FOR  STIMULI  OF  EXPERIMENT  12 


(Parameters  of  A-units:  X  -  2,  ^  -  0) 


/3 

P(^) 

/3 

P(/3) 

0000 

.064 

3020 

.003 

0001 

.048 

0012 

.003 

0010 

.048 

0021 

.003 

0100 

.048 

0120 

.003 

1000 

.048 

0210 

.003 

001  1 

.024 

12  CO 

.003 

01  10 

.02H 

2:00- 

.OCi 

; 

flOil 

1002 

.003 

1  ,00 

.024 

2001 

.003 

CIO, 

.072 

0103 

.003 

i(i:o 

.  '.72 

0301 

.003 

O'  !  1 

.03u 

1030 

.003 

lOI  1 

.030 

3010 

.003 

1  1C! 

.030 

0212 

.003 

i  i  !0 

.030 

2120 

.003 

1 1 II 

.036 

0121 

.003 

1210 

.003 

0003 

.001 

2021 

.003 

0030 

.001 

1202 

.003 

0300 

.001 

1012 

.003 

3000 

.001 

2101 

.003 

0303 

.001 

1212 

.  003 

3030 

.001  ’• 

2121 

.003 

0002 

.012 

1112 

.006 

0020 

.012 

1121 

.006 

3006 -• 

012 

-■>'2 

121  ' 

.0^6 

C/.03 

;:i !  ' 

.00^ 

202  3 

,oiK^ 

0112 

.006 

0201 

oic; 

.027 

.027 

021  1 

1021 

.006 

.006 

2013 

.027  N. 

201  1 

.006 

1 0/  0 

,027  ^ 

1102 

.006 

0203 

.003 

X  1201 

.006 

0302 

.003 

1120 

.006 

20.30 

.003 

V  2110 

.006 

The  only  change  which  occurs  in  this  case  is  that  the  set  /ij  gains  a  larger 
intersection  with  the  set  ■  There  is  no  tendency  here  for  the  A-sets 

responding  to  adjacent  pairs  of  stimuli  to  merge,  as  would  be  the  case  in  a 
four -layer  model,  or  an  open-loop  cross-coupled  network  with  zero  transmission 
times.  This  is  shown  even  more  strikingly  in  the  following  example, 


EXAMPLE  Z:  For  the  same  parameters  as  Example  1,  let  us  extend  the  basic 
subsequence  to  8  stimuli,  using  as  the  preconditioning  sequence: 


The  sequences  for  this  environment  are  now 


=  (s,) 

4 

(S, 

45,S, 

4 

-  (s,s,) 

4 

(5 

5, 5, 5, 

5,5, 

/ 

A 

=  (S,5,s, 

1 

A 

=: 

(5, 

5,,  5,  S; 

S  S 

,5,] 

A 

=  6, 5,5,5,) 

4 

= 

(5, 

45,5, 

5^ 

5,5, 

sequence 

:  occurs  wi 

,th  probabi 

lity 

r 

125  . 

The  initial 

depends 

only  oil  th( 

?  term 

inal 

stimu 

,li 

,  and  takes 

the  form: 

,104  . 

000  , 

104 

.  000 

034 

,  000 

.  034 

.  000 

.  000  . 

104  . 

000 

.  104 

000 

.  034 

.  000 

.  034 

.  104  , 

000  , 

104 

,  000 

034 

.  000 

.  034 

.  000 

.  OOO  . 

104  , 

000 

,  104 

000 

.034 

.  000 

.  034 

-I 

,034  , 

000  , 

034 

.  000 

104 

.  000 

.  104 

,  000 

,  ,000  , 

0  34  , 

000 

,  034 

000 

.  104 

.  000 

.  104 

,  034  , 

000  , 

034 

.  000 

104 

.  000 

.  104 

.  000 

,  000  , 

0  34  , 

000 

,  034 

OUO 

.  104 

,  000 

.  104 
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For  the.  terminal  matrix  (again  with  0 /o  =  /60  )  we  now  have 


/.  104  .000 
/  .  000.  .  174 
.104  .000 
.000  .174 
.104  .000 
.  O'^U  .  I  74 
.  .i04  000 

\.000  .174 


.104  .000 
.000  .174 
.174  000 

,000  .174 
.174  .000 
,000  .174 
1 74  ,  000 
.000  .174 


.104  .000 
.000  .174 
.174  .000 
.  000  .  174 
.174  .000 
.000  .174 
.174  .000 
.000  .174 


.104  .000 
.000  .174 
.174  .000 
.000  .174 
.174  ,000 
.000  .174 
.174  .000 
.000  .174 


/ 


This  corresponds  to  an  oscillating  condition,  in  which  each  A-unit  (after  giving 
its  original  unaltered  response  to  the  first  stimulus  of  the  sequence)  responds 
either  1 ,  0,  1 ,  0,  1 ,  0,  1  or  0,  1 ,  0,  1 , 0,  1 , 0  to  the  remaining  seven  stim.uli  of 
the  sequence. 


In  contrast  to  previous  models,  there  appears  to  be  a  failure  to 
associate  successive  stimuli,  and  an  association  of  every  alternate  stimulus 
instead.  Actually,  appearances  are  misleading  here;  a  strong  association  of 
successive  stimuli  is  masked  by  the  appearance  of  these  stimuli  in  the  test 
sequence  (which  is  identical,  in  this  experiment,  with  the  preconditioning 
sequence).  In  other  words,  the  perceptron  "predicts"  the  A-set  for  the  next 
stimulus  at  precisely  the  time  that  this  stimulus  actually  appears,  and  conse¬ 
quently  the  effect  of  the  prediction  is  not  detected.  The  following  experiment 
reveals  these  "hidden  associations"  in  a  striking  fashion. 


EXPERIMENT  13:  Using  the  same  four  stimuli  as  in  Experiment  IZ,  the 

perceptron  is  shown  the  preconditioning  sequence  S, Sj,  5.,  J 
5/  j  5^  ,  ,  ^4/-  •  •  •  .  It  is  then  tested  with  the  sequence 

vS,  ,  Oj  0,0..-,  and  the  Q-matrix  for  all  subsequences  (from 
both  preconditioning  and  test  sequences)  is  obtained. 
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If  this  experiment  is  performed  with  =100  >  and  all 

other  parameters  as  before s  it  is  found  that  on  presenting  the  test  sequence 
(  S,  ;  0  ;  Oj  •  •  •  )  the  perceptron  recapitulates  the  identical  sequence  of  active 
sets  which  would  have  been  activated  had  the  preconditioning 

sequence  occurred  in  full.  After  ,  the  system  lapses  into  inactivity,  since 

the  preconditioning  sequence  is  interrupted  at  this  poin, 

19.7  Analysis  of  Continuous  Periodic  Environments 


Up  to  this  point,  it  has  been  assumed  that  the  activity  of  the 
perceptron  is  interrupted  at  least  once  every  777  stimuli.  We  now  turn  to  the 
case  of  a  continuous,  unbroken  seque’ice  of  stimuli,  where  the  activity  of  the 
association  system  is  allowed  to  run  on  without  interruption.  To  begin  with, 
the  case  of  a  periodic  stimulus  sequence  will  be  considered,  wheie  the  pre¬ 
conditioning  sequence  takes  the  form: 


5,5,5. 


5  5  S  S 

-’*1  t 


7?? 


the  period  of  the  sequence  being  Tfj  .  Such  an  environment  can  be  considered 
as  being  composed  of  a  set  of  w  subsequences,  each  of  length  tv  +  !  . 

Specifically,  we  have  the  subsequences: 

J,  =  (S,  S3  ...  St,,  5,  ) 

=  (5,  53  .  .  .  5,.,  5,  5,) 


"4,"  5/  S2  63  . 


s  ^ 


Tills  "hallucin.itory  recall"  effect,  in  which  the  perceptron,  cued  by  the 
initial  stimulus  of  the  sequence,  reproduces  the  identical  sequence  of  internal 
states  which  v/ould  liave  been  cictivated  had  the  stimuli  continued  in  their  usual 
order,  is  suggestive  of  some  of  Penfield's  obse^-vations  on  hallucinatory  recall 
of  stereotyped  sequences  induced  ny  electrical  stimulation  of  brain  foci  in 
epileptics  (Ref,  68). 
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Each  sequence  occurs  with  probability  \ / V]  ,  and  each  sequence  begins  and 
ends  with  the  same  stimulus. 

Now  since  the  preconditioning  sequence  is  assumed  to  extend 

indefinitely  into  the  past,  at  any  arbitrary  time  t  ,  the  antecedent  sequence  for 

the  first  and  last  stimulus  of  any  (”wi^  /  j-subsequence  is  Lhe  same;  consequently 
i  ^ 

7'-  -  '/7  for  all  t  .  But  this  means  that  there  are,  in  fact,  only  a  finite 
number  ivi)  of  /  's  for  any  A-unit,  di  ,  so  that  the  steady-state  value  of 
can  be  computed  exactly  by  equation  (19.  14),  where  the  sequence  is 

interpreted  to  mean  the  sequence  ^  in  the  set  of  Trf  subsequences  specified 
above . 


Several  special  cases  are  of  particular  interest.  Consider  first 
the  case  of  a  steadily  maintained  stimulus,  5,  5^  .  .  •  . )  .  Substituting  in 

(19.14),  we  have 


and  it  is  readily  seen  that  the  set  of  active  units  can  never  change  from 
initial  set,  since  this  equation  yields  zero  unless  /2 .  )  -0  for  the 
iteration.  Thus  for  a  steadv  stimulus,  we  have 


the 

first 


0..W  =  Q..fO) 

It  '•u 
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Next,  consider  the  alternating  sequence  5/  Sf  .  this 

case,  (19.  14)  takes  the  form 


.:d' 


0 


4- 


2 


-h 


/3; 


In  this  case,  if  either  'Pf/Sf)  --  I,  //  will  generally  be  non-zero, 

and  the  system  will  tend  to  form  a  union  of  the  sets  initially  responding  to  5, 
and  h,  (provided  .r,,  r./  =h  0  ). 


Finally,  consider  the  stimulus  sequence  of  Experiment  IZ, 
consisting  of  a  period  of  alternation  of  h',  and  5^  followed  by  an  alternation 
of  ,  and  J_,  ,  as  described  in  Chapter  17.  Rather  than  compute  the 

entire  20  by  20  Q-matrix  for  Experiment  12,  we  present  here  a  "miniaturized 
version"  of  this  experiment,  based  upon  the  eight-stimulus  sequence 
employed  in  Example  2  of  the  preceding  section.  For  the  continuous  environ¬ 
ment,  the  eight  sequences  vvill  be: 
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J) 


(S,S,5,S,S,5,S,S,S,) 

(5,5,  5,535,535,1,5,) 


fs, 5,5,  5,  5,  5,  5,S,Sj) 

c  C  C  C  C  C  C  C  C  ^ 

x-'.-l  -^3  ^4  -^2  ^3  4J 


I  r  r  '  c  '  f  <■  (-•  \ 

[  p  ’y  >4  -\i  -’/?  “"/  -"y  ’) 


S4 

5, 

r 

-2 

bj  Jy 

54 

s, 

2y  bj  ^4 

b) 

It  ia  found  that  in  this  experiment,  there  is  no  choice  of  para¬ 
meters  which  will  yield  an  increase  in  (V,,  .  Q  ^4.  .  ,  and  without 

producing  a  corresponding  increase  in  the  set  of  A-units  responding  jointly  to 
all  stimulus  sequences.  It  can  also  be  shown  that  no  matter  how  far  the 
period  of  the  preconditioning  sequence  is  extended  (by  increasing  the  duration 
of  by  alternation  and  also  increasing  the  duration  of  alternation) 

the  system  will  never  be  able  to  selectively  combine  the  sets  ''■'ft  j 

A^j  as  in  previous  models.  There  is,  nonetheless,  a  "predictive"  effect 
which  would  be  revealed  if  the  stimuli  were  suddenly  cut  off,  as  in  Experi¬ 
ment  1 3  . 


From  this  example  (and  those  of  the  preceding  section)  it  is 
clear  that  the  condition  for  selective  merging  of  A -sets  for  temporally 
adjacent  stimuli  is  not  as  easily  satisfied  as  in  the  four -layer  system,  or 
open-loop  systems  with  zero  transmission  time.  Experiment  i;4,  however, 
illustrates  a  simple  modification  of  the  preconditioning  sequence  by  which 
such  a  merger  can  be  obtained. 
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EXPERIMENT  14:  The  same  four  stimuli  are  employed  as  in  Experi¬ 
ments  12  and  13.  The  preconditioning  sequence,  however,  takes 
the  form:  5^5^  5^  5^  5j  5^  5^  5^  5^  5^  5^  Sj  ^ 

repeated  ad  infinitum.  The  terminal  Q-matrix  is  obtained  as 
before,  for  the  twenty  possible  sequences  of  duration  21. 

In  this  case,  it  is  found  that  there  will  be  a  tendency  for  the 
sets  A,  and  to  merge,  and  for  the  sets  Aj  and  to  merge  in  a 

separate  "cell  assembly".  What  happens  here  is  that  the  A-units  responding 
to  S,  tend  to  be  associated  to  the  two  most  common  successors  of  S,  in 
the  preconditioning  sequence:  namely,  S,  itself,  and  ,  Similarly, 

is  associated  both  to  5^  and  S,  .  Thus,  vdien  5,  occurs  at  the  start  of 
the  sequence  it  tends  to  be  followed  (coincident  with  its  second  appearance) 
by  the  combined  set  (A,  )  A,)  •  When  the  first  5,  stimulus  appears.  A, 
combines  with  the  "predicted"  set,  and  the  combined  (A,  ^  set 

tends  to  persist  until  the  first  occurrence  of  5^  ,  at  which  point  it  may 

combine  with  the  new  A3  set,  or  may  become  inactive,  depending  upon  the 
magnitude  of  order  to  prevent  the  original  set  from  persisting 

indefinitely  (since  each  A-set  tends  to  predict  itself,  on  the  following  c'ycle) 
must  be  kept  small  enough  so  that  the  /  -components  alone  are 
insufficient  to  activate  A-units  whose’  -components  are  zero.  In  this 
case,  only  part  of  the  original  A-sets  will  be  activated  in  the  absence  of  the 
actual  stimulus,  but  a  bias  will  still  remain  in  the  direction  of  the  desired 
combination  of  A-sets. 

*  The  term  "cell  assembly"  seems  appropriate  here,  as  the  sets  which  are 
formed  in  the  terminal  state  of  a  cross -coupled  perceptron  bear  a  close 
resemblence  in  organization  and  functional  properties  to  the  cell  assembly 
concept  proposed  by  Hebb,  in  Ref.  33. 
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In  general,  if  each  stimulus  which  forms  part  of  an  "event"  can 
occur  with  equal  probability  after  any  other  stimulus  in  the  same  event,  then 
all  of  the  A -sets  responding  to  these  stimuli  will  tend  to  merge,  at  least  in 
part,  and  will  be  evoked  by  any  stimulus  of  the  event-class.  This  is  essen¬ 
tially  the  same  effect  which  was  found  for  four -layer  perceptrons  in 
Chapter  16 . 


Actually,  with  the  -vectors  corresponding  to  those  in  Table  9, 
(for  A-units  with  only  three  retinal  connections)  the  system  is  not  well 
behaved  in  Experiment  14  regardless  of  the  choice  of  threshold  and 
With  larger  numbers  of  connections  and  the  possibility  of  higher  thresholds, 
however,  it  seems  likely  that  the  desired  effect  could  be  obtained  with  the 
preconditioning  sequence  given  in  the  experiment.  A  / -perceptron  (or  a 
P  -perceptron)  would  probably  be  somewhat  better  behaved  in  this  experi¬ 
ment,  as  it  would  tend  to  inhibit  the  sets  of  A-units  characteristic  of  the 
first  "event"  once  the  second  event  began.  In  the  oc  -system,  there  is  a 
strong  tendency  for  all  A-sets  to  merge  whenever  sufficient  to 

permit  the  merger  of  the  desired  sets. 

19.8  Analysis  of  Continuous  Aperiodic  Environments 


If  the  preconditioning  sequence  is  not  periodic,  some  sort  of 
approximation  procedure  must  be  used,  if  Equation  (19.  14)  is  to  be  applied. 
Two  possibilities  suggest  themselves:  First,  the  aperiodic  sequence  (if  it 
is  statistically  uniform  throughout)  can  be  approximated  by  a  periodic 
sequence  if  the  period  is  sufficiently  long  to  encompass  all  likely  juxta¬ 
positions  and  short  subsequences  of  stimuli.  Second,  we  can  consider  all 
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subsequences  of  length  in  ,  assigning  a  probability  to  each,  and  analyze  the 
system  as  though  we  were  dealing  with  a  finite -sequence  environment,  con¬ 
sisting  of  the  various  m  -sequences  in  an  appropriate  frequency  mixture.  In 
this  case,  the  analysis  should  converge  to  a  correct  solution  as  777  becomes 
large,  provided  the  original  sequence  is  statistically  uniform.  If  the  statistical 
composition  of  the  original  preconditioning  sequence  changes  over  time, 
neither  of  these  methods  are  applicable,  and  it  seems  likely  that  accurate 
solutions  can  then  be  obtained  only  by  actually  simulating  the  system  and 
observing  its  behavior  empirically. 

In  the  experiments  which  are  of  primary  concern  at  this  time,  it 
is  always  possible  to  assume  a  statistically  uniform  preconditioning  sequence, 
so  that  one  of  the  two  m.cthods  described  above  can  be  applied.  In  practice, 
this  problem  is  likely  to  be  soluble  only  for  relatively  small  numbers  of 
stimuli  in  the  environment,  as  the  Q-m.atrices  rapidly  become  too  large  to 
handle  in  currently  available  digital  computers.  For  long  stim.ulus  sequences 
and  large  numbers  of  stimuli,  digital  simulation  remains  the;  preferred  techni¬ 
que,  and  this  offers  the  additional  advantage  of  being  applicable  to  small 
perceptrons  or  systems  where  the  assumption  of  infinitesimal  transmission 
time  is  inadmissible.  In  the  preceding  examples,  where  theoretical  values 
(rather  than  empirical  values)  of  Q.  were  used,  hi  was  implicitly  taken 
to  be  very  large  . 
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19.9 


Cross -Coupled  Perceptrons  with  Value-Conservation 


The  two  types  of  value -conserving  systems,  qf  -systems  and 
r  -systems,  which  were  considered  in  section  16.6,  are  also  of  interest 
in  cross -coupled  systems.  The  f‘  -system,  which  tends  to  strengthen 
connections  to  the  A-set  responding  to  the  most  likely  successor  of  the 
present  stimulus,  while  developing  inhibitory  connections  to  the  A-units 
responding  to  unlikely  successors,  appears  to  be  the  more  promising  of  the 
two.  In  most  environments,  however,  both  systems  will  probably  show 
similar  phenomena,  provided  transitions  between  stimuli  can  occur  symmetri¬ 
cally  in  either  direction.  The  analysis  of  the  -/-system,  which  is  somewhat 
more  familiar  from  previous  work,  will  be  considered  first. 

19.9.1  Analysis  of  /  -systems 


In  the  / -perceptron,  the  total  value  of  the  set  of  input  connections 
to  each  A-unit  is  conserved.  Specifically,  (assuming  the  system  to  be  fully 
coupled)  the  change  in  the  value  of  connection  /O .  .  is  given  by 


A  A  .  -  a.,  it) 
'■  I  r  ' 


A/. 


A 


(19.15) 


Instead  of  (19.  19),  this  leads  to  the  differential  equation: 


(19. 16) 
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solution  for  the  ■/ 's  .  The  task  is  complicated  in  this  case,  however,  by 
the  presence  of  the  unknown  quantities  A  V.  (t)  in  the  equation,  which 
we  have  not  hitherto  had  to  evaluate. 


For  the  7'  -system,  any  equilibrium  equation  must  be  of  the 

form: 


V/here  A  =  set  of  active  A-unit  sets,  A.  ,  for  which  the  value  of  /  (co) 
is  computed.  As  long  as  all  ^  's  remain  fixed,  the  /'s  will  tend 
exponentially  towards  such  an  equilibrium  condition  ,  as  in  previous  models. 
Now  consider  the  set  of  units  whose  <p  's  change  value  at  time 

p 

We  wish  to  find  the  asymptotic  value  of  the  change  in  Y.  due  to  adding  or 
subtracting  this  set  of  active  units  to  the  set  A^i  at  time  .  This  is 

p 

equal  to  the  difference  between  the  asymptotic  value  of  Y^  based  on  the  new 
set  of  active  units  A^i  )  arid  the  asymptotic  value  based  on  the  old  set 
of  active  units  A  i  (t  *  )  ■  Specifically,  from  (19.  17),  and  vi  th  an 

'  ^4.-1 

obvious  extension  of  previous  notation, 
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-i 


(19. 18) 


With  this  equation  for  the  asymptotic  value  of  the  "incremental 

set"  of  A-units  which  become  active  (or  inactive)  at  time  ,  it  becomes 

possible  to  compute  the  time -dependent  solution  in  much  the  same  mannei'  as 

for  the  four -layer  perceptron.  To  begin  with,  we  obtain  the  functions  ^ 

■« 

(defined  in  equation  16.23)  for  all  ,  and  thus  determine  the  next 

for  which  change.  This  gives  us  the  values  of  <l>  ^ 

which  are  required  in  equation  (19.  18).  We  then  compute  the  actual  value 

o£  A  7^  (t^  )  as  follows.  The  contribution,  A  7^.  ,  being  composed 

of  a  number  of  individual  values,  /V'-j  ,  will  approach  its  asymptotic  value 

0 

exponentially,  with  the  same  time -constant  as  the  Y  's.  Thus,  if  we  can 
determine  the  value  of  the  set  of  contributing  connections  at  the  start  of  the 
interval  (time  t .  )  we  can  determine  its  value  at  time  t,  .  Now  the 

value  at  is  simply  the  sum  of  theoh^(l^^  ^or  all  ^  such  that  .) 

changes  at  t*  ■  We  will  use  the  notation  for  this  starting 

value.  Specifically, 

*  To  avoid  computing  ^  approximation  is  required,  e.g., 

•.«  E  )  -0. 
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Then,  by  analogy  to  (16.24),  we  have 


(19.20) 


r  'T 

Thus,  the  complete  solution  for  at  time  t  (including  the  discontinuity 

at  the  terminal  end  of  the  interval)  is  given  by: 


s 


M[Ay7Ai.,| 

^  - 3 - 


The  value  of  the  dicontinuity  time,  t,  <  is  obtained  as  before,  from 

A 

equation  (16.25). 


This  completes  the  analysis  of  the  cross -coupled  /-system. 
While  no  cases  have  actually  been  computed  at  the  present  time,  it  seems 
likely  that  this  system  will  generally  be  better  behaved  than  the  u  -system, 
particularly  in  such  problems  as  Experiment  14,  where  there  is  a  tendency  for 
all  A-sets  to  merge  under  cx, -system  dynamics. 


19.9.2  Analysis  of  T -systems 

In  the  r  -s.y.stem,  where  the  value  is  conserved  over  the  set  of 
output  connections  from  each  A-unit,  the  change  in  the  value  of  the  connection 
/O. .  is  now 


A^..  =  a.  (^-r) 


1 


At 


(19.22) 
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This  leads  to  the  diffei*ential  equation  and  equilibrium  equation,  respectively, 


(19.23) 


(19.24) 


From  these  equations,  a  solution  for  clearly  be  computed 

along  the  same  lines  as  in  the  previous  section,  for  the  y'-system. 
Specifically,  the  asymptotic  value  for  the  connections  from  the  difference  set 
takes  the  form; 


') 


(19.25) 


•Z  pcA )  H  k/.i  p  (“/'ft/ 

A  y!'  it^)  A*  computed  by  equations  (19.19)  and  (19-20) 

^  4  -A  4  I  J 

without  any  modification,  so  that  the  final  solution  can  be  obtained  as  before 
from  Equation  (19-21). 


Due  to  its  apparent  superiority  as  a  predictive  system,  a.nd  since 
it  appears  to  have  the  same  advantages  in  stability  of  the  A-set  organization 
as  the  ■/  -system,  this  model  seems  likely  to  be  the  most  versatile  system 
analyzed  thus  far. 

19.10  Similarity  Generalization  Experiments 


The  consideration  which  first  drew  attention  to  the  importance 
of  cross -coupled  perceptrons  was  the  prediction  by  Rosenblatt  (Ref.  85)  that 
such  networks  would  be  capable  of  improving  their  performance  in  similarity 
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generalization,  as  a  result  of  prolonged  exposure  to  an  environment  in  which 
stimuli  are  more  likely  to  be  succeeded  by  their  transforms  than  by  unrelated 
stimuli.  In  Chapter  16,  it  was  shown  that  a  suitably  organized  four-layer 
perceptron  has  such  a  capability,  and  the  above  analysis  shows  that  for 
sequences  in  which  the  activity  of  a  cross  -coupled  perceptron  is  interrupted 
after  every  other  stimulus,  its  performance  should  be  equivalent  to  the  four- 
layer  model.  Thus  the  original  prediction  appears  to  be  upheld. 

The  mathematical  analysis  of  cross -coupled  networks  has  been 
completed  too  recently  to  permit  detailed  examples  of  similarity  generalization 
to  be  worked  out  at  this  time,  A  series  of  simulation  experiments  have  been 
completed,  however,  employing  a  program  written  by  Trevor  Barker  for  the 
IBM  704  In  this  program  a  fully  coupled  network  of  102  association  units  is 
represented,  with  /  -system  dynamics.  The  model  differs  from  those 
analyzed  above,  in  that  the  values  do  not  decay.  This  leads  to  "instability"  of 
the  system  (a  tendency  to  go  into  terminal  oscillatory  modes  with  massive 
A-unit  activity,  unrelated  to  the  stimuli  which  are  presented),  unless  some 
additional  measures  are  taken  to  limit  the  growth  of  the  connection  values.  The 
program  was  therefore  modified  for  bounded  values.  In  order  to  prevent  the 
tendency  of  the  y' -system  to  turn  off  most  of  the  initially  responding  A-units 
after  the  first  few  preconditioning  stimuli,  a  further  modification  was 
included  to  permit  half-integer  values  for  0  .  Thus  the  values  of  the 

cross -coupling  connections  have  no  effect  until  the  magnitude  of  '/  is  at 
least  equal  to  \ ^  1 
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Even  in  this  modified  program,  performance  is  considerably 
poorer  than  might  be  expected  of  the  decaying  value  models,  since  the  system 
ultimately  goes  to  a  saturation  condition,  with  all  values  either  at  the  upper 
or  lower  bound.  Prior  to  this  saturation  state,  however,  (and  to  a  lesser 
degree  even  in  its  saturated  condition)  similarity  generalization  can  be 
successfully  demonstrated,  as  in  the  following  experiments. 

Figure  54  shows  the  results  of  two  experiments,  with  five 
excitatory  and  five  inhibitory  retinal  connections  to  each  A-unit,  6  -  1.5 ^ 
fj  =  .005  ,  and  an  upper  bound  of  .2  for  all  values.  In  each  case,  the 

preconditioning  sequence  consisted  of  random  stimuli,  alternating  with  their 
transforms.  The  transform,  T(5)  ,  consisted  of  a  displacement  of  5" 

by  half  the  width  of  the  retina.  The  retina  itself  was  a  4  by  36  mosaic 
(144  points),  and  all  stimuli  covered  one  fourth  of  these  points.  In  the  first 
experiment,  the  preconditioning  stimuli  consisted  of  random  "salt  and  pepper 
patterns",  in  which  any  combination  of  points  is  equally  likely.  In  the  second 
experiment,  the  stimuli  were  constructed  by  a  "blob  generating  program"  which 
produces  coherent,  but  randomly  shaped  patterns  such  as  those  illustrated  in 
the  figure.  The  test  stimuli,  in  each  case,  consisted  of  the  same  set  of  ten 
coherent  patter.ns  (rectangular  designs).  After  being  exposed  to  the  pre- 


c  onditioning 

sequence  5,,  T(5,), 

T(S,),  5,,  T(S,),  -  .  . 

^  activity  of 

the  A-system  is  interrupted,  and 

a  G-matrix  is  computed  for 

the  twenty 

sequences : 

Jf  = 

s,s, 

II 

T(S,)Tfi,) 

II 

^2.  % 

Mh- 

TSJ  T(5J 

» 

^0  " 

II 

» 
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Figure  5i|  CROSS-COUPLED  PERCEPTRON  SIMULATION  EXPERIMENTS 


This  G-matrix  indicates  which  of  the  ten  transforms  would  be  identified 
correctly  if  the  perceptron  were  trained  to  recognize  their  images,  by  means 
of  a  single  reinforcement.  Sequences  of  duration  2  are  used, to  provide  time 
for  impulses  to  propagate  over  the  cross-connections  before  testing  the 
response . 


The  curves  sh  iw  the  mean  performance  of  ten  perceptrons  over 
the  set  of  ten  test  transforms,  as  a  function  of  the  number  of  preconditioning 
stimuli.  In  the  case  of  the  coherent  stimuli,  note  that  learning  is  both  more 
rapid,  and  saturation  is  reached  more  quickly  than  with  the  random  stimuli 
(where  the  saturation  condition  has  not  been  reached  even  after  5000  pre¬ 
conditioning  stimuli).  While  the  peak  performance  level  is  less  than  .60,  a 
statistical  evaluation  of  the  data  reveals  that  the  trend  is  definitely  significant. 
All  ten  perceptrons,  individually,  showed  a  trend  in  the  expected  direction, 
so  that  the  chance  of  obtaining  these  results  accidentally  would  be  less  than 
.001.  It  should  be  noted  that  since  the  expected  generalization  coefficient, 

.  ,  from  a  stimulus  to  its  disjoint  transform  is  negative  (in  a  /  -system) 

these  perceptrons  had  to  overcome  an  initial  negative  bias  before  achieving 
even  the  "chance"  level  of  50%  correct  identifications. 

These  experiments  confirm  the  predicted  tendency  of  cross - 
coupled  perceptrons  to  generalize  on  the  basis  of  similarity,  in  a  suitably 
organized  environment.  They  also  indicate  the  advantage  of  coherent  over 
random  stimuli,  which  is  more  pronounced  in  larger  retinas  than  that 
illustrated.  Doubling  the  number  of  retina.l  points  would  virtually  eliminate 
the  trend  which  is  found  for  random  stimuli,  while  the  coherent  stimulus 
curve  would  be  relatively  unaffected.  All  of  these  results  are  consistent 
with  the  laws  of  similarity  generalization  which  were  tentatively  proposed  in 
Section  15.4. 
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Until  further  empirical  studies  are  completed,  the  theoretical 
results  obtained  for  cros s -coupled  systems  should  still  be  interpreted  with 
caution.  There  is  at  present  no  knowledge  of  the  variance  in  performance 
over  perceptrons,  and  how  this  relates  to  the  size  of  the  system;  nor  can 
we  estimate  the  effects  of  finite  stimulus  sequences,  in  which  the  assumption 
of  an  infinitesimal  rate  of  reinforcement  per  stimulus  is  not  fully  justified. 
The  equations  of  the  preceding  sections  represent  limiting  behavior  for  large 
values  of  ,  very  gradual  memory  modification,  and  very  long  training 
sequences.  The  assumption  of  large  can  be  obviated  by  writing  the 

equations  with  empirical  ^  -vectors  measured  for  a  particular  perceptron, 
but  in  this  case  the  results  can  be  generalized  only  by  means  of  an  empirical 
sampling  procedure,  with  miany  such  perceptrons.  The  given  equa¬ 
tions  will  probably  be  found  to  yield  correct  qualitative  results,  but 
considerable  work  is  still  required  to  test  their  quantitative  accuracy. 

19.11  Comparison  of  C  ros  s  -Coupled  and  Multi  -  Layer  Systems 

In  similarity  generalization  experiments,  it  has  already  been 
observed  that  there  is  a  marked  similarity  between  the  performtince  of  the 
four-layer  perceptron  of  Chapter  16,  the  open-loop  cross -coupled  system 
of  Chapter'!?,  and  the  closed-loop  cross -coupled  systems  considered  above. 
All  of  these  systems  are  capable  of  learning  to  associate  patterns  which  occur 
frequently  in  temporal  succession,  and  abstracting  the  principle  of  simi¬ 
larity  from  a  transformation  sequence  (in  which  stimuli  alternate  with  their 
transforms).  All  of  these  systems  will  tend  to  work  better  with  coherent 
patterns  than  with  random  point  patterns.  In  all  cases,  the  constant  Ng 
determines  the  nature  of  the  terminal  G-matrix  which  is  obtained,  for  a 
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given  environment.  Actually,  an  exact  equivalence  is  found  between  the 
performance  of  the  fully  cross -coupled  system  in  finite -sequence  environ¬ 
ments,  with  sequences  of  two  stimuli,  and  the  performance  of  the  open-loop 
system  of  Chapter  17  with  T  =■  I  ■  Suppose  the  system  of  Chapter  17  is 

extended  to  include  an  infinite  number  of  A -sets,  each  with  identical  connec- 

.  t  h 

tions  from  the  retina,  and  with  variable  connections  to  each  unit  in  the  ^ 

t  h 

A-set  from  each  member  of  the  A-l  A-set  (and  allowing  unit  time  delay 
in  transmission).  It  can  then  be  shown  that  the  states  of  the  A-set  for 

the  first  stimuli  in  the  sequence  will  correspond  exactly  to  the  states  of 
the  equivalent  fully  cross -coupled  model  (having  all  S-A  connections  equivalent 
to  those  in  the  open-loop  model).  Thus,  the  fully  cros s -coupled  model, 
considered  through  all  time,  is  equivalent  to  the  output  of  an  infinitely  extended 
open-loop  model,  of  the  type  discussed  in  Chapter  17. 

While  these  similarities  would  lead  us  to  expect  basically 
similar  behavior  in  most  problems  for  these  different  types  of  systems,  some 
noteworthy  differences  do  exist  between  the  cros s -coupled  system  and  multi¬ 
layer  systems  with  finite  numbers  of  layers.  First  of  all,  there  is  an  inherent 
sequence -dependence  in  the  cross -coupled  model,  which  makes  its  present 
state  a  function  of  the  recent  succession  of  events,  (i.e.  ,  stimuli)  rather 
than  just  the  last  event  to  occur.  This  means  that  all  cross -coupled 
systems  have  some  capability  for  temporal  pattern  recognition,  even  without 
variation  in  the  transmission  times  of  the  input  connections.  Secondly,  the 
cross -coupled  systems  are  likely  to  reach  their  terminal  condition  more 
rapidly,  and  with  initially  accelerating  rates  of  adaptation,  since  the  differ¬ 
ential  equation  depends  on  changes  both  in  the  transmitting  and  receiving 
sets  of  A-units,  while  in  the  four-layer  model,  the  differential  equation 
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depends  only  on  changes  in  the  receiving  set,  the  transmitting  set  being  fixed 
for  all  time.  The  dependence  on  both  receiving  and  transmitting  sets  makes 
the  cross -coupled  system  more  subject  to  "instability"  phenomena,  and 
probably  tends  to  reduce  the  "dynamic  range"  of  the  system  (as  a  function  of 

in  most  cases.  These  phenomena  have  not  yet  been  stuijiied 
sufficiently  to  present  conclusive  quantitative  results  at  this  time. 

A  more  important  difference  than  any  of  the  above  may  be 
potentially  present,  although  this  remains  in  the  realm  of  speculation  at 
present.  In  a  value -conserving  cross -coupled  perceptron,  where  there  is 
the  possibility  of  developing  pronounced  inhibitory  interaction  between  A-sets, 
there  is  a  tendency  to  develop  "cell  assemblies"  (in  Hebb's  sense),  and  these 
cell -as semblies  tend  to  rival  one  another  for  dominance  at  all  times.  It 
seems  possible  that  such  a  phenomenon  may  provide  a  basis  for  figure- 
ground  separation  in  complex  sensory  fields,  where  it  is  desired  that  the 
system  attend  to  one  object,  or  component  of  the  input  situation,  and  ignore 
the  remainder.  This  will  be  discussed  further  in  Part  IV.  If  such  an  effect 
can  be  demonstrated,  many  of  the  remaining  problems  in  the  design  of  a 
perceiving  system  would  be  solved. 
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20. 


PERCEPTRONS  WITH  CROSS-COUPLED  S  AND  R-SYSTEMS 


A  number  of  interesting  effects  may  be  obtained  by  cross -coupling 
the  S -units  or  R-units  of  a  perceptron.  Several  such  systems  are  considered 
briefly  in  this  chapter.  The  first  section  deals  with  cross -coupled  sensory 
systems;  the  second  section  deals  with  cross -coupled  R-systems.  Detailed 
analyses  are  not  presented  here, although  several  analytic  studies  are 
available  in  the  referenced  literature. 

20.1  Cross -coupled  S -units 

If  the  sensory  units  are  arranged  in  a  two  dimensional  array, 
or  retina,  then  it  has  been  proposed  that  inhibitory  interconnections  between 
each  S-unit  and  its  nearest  neighbors  will  tend  to  inhibit  activity  most 
strongly  in  the  center  of  a  field  of  illumination,  and  less  around  the  edges. 
Such  a  system  should  lead  to  accentuated  edges  or  boundaries  for  a  visual 
pattern,  reducing  the  relatively  redundant  information  coming  from  interior 
regions.  Systems  utilizing  this  principle  have  been  proposed  by  Taylor 
(Ref.  99),  by  Inselberg,  Lbfgren,  and  von  Foerster  (Ref.  4),  and  by  a 
number  of  others.  The  Inselberg-Lofgren-von  Foerster  treatment  includes 
a  more  detailed  quantitative  analysis  than  was  hitherto  available,  including 
cases  in  which  the  probability  of  interconnection  of  two  units  is  a  Gaussian 
function  or  an  exponential  function  of  the  distance  between  them. 

While  it  appears  that  contour  detectors  can  indeed  be  constructed 
by  this  means,  it  should  be  noted  that  some  information  is  lost  in  the 
process;  namely,  the  indication  of  the  direction  of  the  illumination  gradient 

*  See  also  Chapter  23,  on  visual  analyzing  mechanisms. 

i 

I 
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across  the  contour.  Thus  if  a  square  patch  of  illumination  is  operated  upon 

by  the  network  to  yield  a  square  outline,  there  is  no  way  to  tell  whether 

the  inside  of  the  square  was  light  and  the  outside  dark,  or  vice  versa.  The 

contour -detector s  proposed  by  Rosenblatt  in  Ref.  79,  which  consist  of 

A-units  with  circular  or  elliptical  distributions  of  origin  points,  with 

slightly  different  centers  for  excitatory  and  inhibitory  origin  clusters,  still 

❖ 

preserve  this  gradient  information. 

A  somewhat  more  interesting  possibility  has  been  demonstrated 
by  Inselberg,  et  all.  if  three  layers  of  units  with  anisotropic  connections  are 
superimposed  on  one  another,  with  a  rotation  of  the  axes  of  symmetry  by  60° 
in  the  successive  layers.  With  such  a  system,  it  appears  to  be  possible  to 
construct  a  network  from  which  there  is  zero  output  from  a  straight-line 
stimulus  (regardless  of  its  orientation)  but  a  non-zero  output  from  a  curved 
line.  Such  systems  clearly  deserve  more  study  as  possible  stimulus  analyzing 
mechanisms  for  reducing  the  input  data  to  a  perceptron. 

Systems  with  e.xcitatory  interconnections  between  S-units  are  of 
relatively  little  interest,  as  such  a  network  would  generally  lead  only  to  a 
spread  of  activity  from  the  stimulus  region.  The  only  useful  function  which 
such  connections  might  have  would  be  in  smoothing  irregular  or  broken 
images,  by  filling  in  holes  and  gaps;  such  an  application,  however,  seems 
to  be  ol  questionable  utility  at  the  present  time. 

20.2  Cross-coupled  R-units 


Inhibitory  interconnections  between  R-units  may  be  useful  in 
several  ways.  One  application  is  to  guarantee  that  no  more  than  one  R-unit 
can  be  "on”  at  any  time.  For  this  purpose,  all  R-units  are  given  inhibitory 
See  also  Hubei,  Ref.  113,  for  relevant  biological  evidence. 
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interconnections  to  all  the  others;  whichever  unit  first  goes  on,  inhibits  all 
the  others,  holding  them  off,  Such  a  system  will  tend  to  "hang  up"  in  this 
state,  until  the  positive  signal  to  the  first  R-unit  is  reversed,  permitting 
some  other  unit  to  come  on.  If  the  speed  of  response  of  an  R-unit  is 
proportional  to  the  magnitude  of  its  input  signal,  such  a  scheme  can  be  used 
to  select  the  R-unit  with  maximum  input  from  a  given  stimulus. 

In  R-controlled  reinforcement  systems,  inhibitory  connections 
between  R-units  may  sometimes  be  employed  to  guarantee  that  a  unique 
response  is  associated  to  each  new  stimulus  in  succession.  Suppose  there 
are  four  stimuli,  which  activate  disjoint  or  nearly  disjoint  sets  of  A-units. 
Let  there  be  four  R-units,  with  inhibitory  connections  as  follows: 


In  this  scheme,  unit  inhibits  (absolutely)  all  successive  R-units 

(R-  •  )  .  Now  if  stimulus  5,  occurs,  and  transmits  an 

'  1. 1 1  ’  1 2  •'  ' 

initially  positive  signal  to  all  R-units,  only  can  go  on.  With  an 

R-controlled  value -conse rving  system  (in  which  the  sum  of  values  over  all 
connections  is  held  constant)  5,  will  then  develop  an  excitatory  signal  to 
,  and  negative  signals  to  all  other  R-units.  At  the  same  time  (since 
we  have  assumed  essentially  disjoint  A-"sets)  the  value -conse rvi  ng  system  will 
guarantee  that  the  response  generalizes  negatively  to  all  other  stimuli. 

Thus,  when  occurs,  it  will  tend  to  turn  off  ,  but  will  try  to  turn 

on  j  and  .  Of  these,  only  can  remain  on,  due  to  the 

inhibitory  coupling,  so  that  5^^  (or  whichever  stimulus  occurs  second  in  the 
sequence)  will  become  associated  to  R  .  Similarly,  S  is  associated  to 
Rg  ,  and  5^  to  . 
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This  scheme  becomes  somewhat  less  trivial  if  it  is  applied  to 
the  four -layer  perceptrons  of  Chapter  16,  subsequent  to  a  preconditioning 
sequence  in  which  the  perceptron  has  learned  to  associate  a  unique  A-set 
to  each  similarity  class  of  stimuli  in  a  given  environment.  The  above 
method  can  then  be  employed  to  assign  a  unique  response  to  each  class  of 
stimuli  (provided  the  terminal  A-sets  have  sufficiently  small  intersections). 

While  the  interconnection  schemes  proposed  here  for  S  and 
R-units  are  occasionally  useful  for  control  purposes,  they  do  not  introduce 
any  fundamentally  new  properties  of  importance.  The  most  striking  pheno¬ 
mena  to  be  found  in  cros  s -coupled  systems  are  the  similarity  generalizing 
capabilities  of  the  c ros s -coupled  association  systems  --  with  the  tantalizing 
possibility  of  a  figure -ground  mechanism  still  to  be  investigated  in  future 
work . 
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PART  BL 


BACK-COUPLED  PERCEPTRONS  AND  PROBLEMS 


FOR  FUTURE  STUDY 


21 .  BACK  COUPLED  PERCEPTRONS  AND  SELECTIVE  ATTENTION 


In  Parts  II  and  III  of  this  volume,  we  have  tried  to  establish  the 
fundamental  properties  of  two  topological  classes  of  perceptrons:  series - 
coupled  and  cross -coupled  systems.  While  the  possible  configurations  of 
these  two  types  of  perceptrons  have  by  no  means  been  exhausted,  the  most 
general  forms  of  series -coupled  and  cross -coupled  networks  appear  to  be 
sufficiently  well  understood  so  th-'t  their  principles  can  now  be  applied  to  the 
analysis  of  more  elaborate  systems.  The  most  general  network  is  achieved 
with  the  addition  of  back-coupling  (Definition  26,  Chapter  4),  so  that  layers 
of  units  which  are  relatively  remote  from  the  sensory  end  of  the  perceptron 
can  modify  the  activity  of  layers  which  are  relatively  close  to  the  sensory 
end.  Given  this  additional  mode  of  coupling,  then  virtually  all  perceptrons 
of  interest,  however  elaborate  their  structure,  can  be  regarded  as  compounds 
or  modifications  of  the  types  previously  considered. 

The  modulating  effect  of  back-coupling  upon  the  behavior  of  a 
perceptron  will  be  considered  qualitatively  in  this  chapter.  It  will  be  seen 
that  while  the  analysis  of  such  systems  can  frequently  be  carried  out  in  term.s 
of  already  established  principles,  their  behavior  possesses  a  new  order  of 
sophistication.  In  particular,  the  psychological  phenomena  of  selective 
attention  and  "cognitive  set"  now  begin  to  emerge.  A  related  exposition  of 
these  ideas  can  be  found  in  Rosenblatt,  Ref.  79,  Chapter  X. 
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21.  1 


Three -Layer  Systems  With  Fixed  R-A  Connections 


21.1.1  Single  Modality  Input  Systems 


The  first  case  to  be  considered  is  the  class  of  three -layer 
perceptrons  having  fixed-value  connections  from  the  R-units  back  to  the 
A -units.  For  simplicity,  it  is  assumed  that  there  is  no  cross -coupling 
within  any  of  the  three  layers.  Such  a  perceptron  with  two  R-units  can  be 
represented  by  the  symbolic  diagram: 


s 


where  solid  arrows  represent  fixed-value  connections,  and  broken  lines 
represent  \'ariable -valued  connections.  In  particular,  assume  that  there  is 
a  connection  from  every  R-unit  back  to  every  A-unit,  half  of  these  connections 
chosen  at  random,  having  the  value  +1,  and  the  other  half  having  the  value  -1. 
In  the  following  section  it  will  be  assumed  that  the  R-units  are  of  an  "on-off" 
variety  (having  the  outputs  1  or  0,  rather  than  +1  and  -1)  although  analogous 
effects  can  be  found  for  simple  R-units.  It  is  also  assumed,  for  the  sake  of 
avoiding  impossible  closed-loop  situations,  that  all  connections  have  a  short 
time  delay,  T  ;  a  stimulus,  however,  is  generally  assumed  to  be  held  on 
the  retina  for  a  time  T  >>  T 


-472 


The  signal  which  is  fed  back  to  an  A-unit  from  the 

response  unit  r  is  given  by  the  linear  function 

A 

'^ri  =  ^  ^rl 

Thus  /Opj  is  equal  either  to  or  0,  depenl  ing  on  whether  r  =  I  or 

0  .  The  effect  of  these  feedback  signals  on  the  set  of  A -units  responding  to 

a  given  stimulus  is  shown  in  Figure  55.  The  symbol  ^  is  used  to  represent 
the  component  of  the  input  signal,  OC.  ,  which  comes  to  the  A-unit  from  the 
retina.  It  is  assumed  that  there  are  two  R-units,  so  that  there  are  four 
disjoint  sets  of  A -units  with  roughly  ^4^/4  units  in  each  set,  corresponding  to 
the  four  possible  combinations  of  /zr  .  and  /ir„  •  .  These  sets  of 

A-units  are  represented  by  the  four  quadrants  of  the  diagram.  The  circles 
indicate  the  values  of  ^3 .  received  from  the  given  stimulus,  in  relation 
to  the  threshold,  0^'  .  The  A-units  in  the  innermost  circle,  for  which 

/i  ^  0  -h  2  ,  will  always  be  on  when  the  given  stimulus  occurs,  regardless 

of  the  condition  of  the  R-units.  Those  units  for  which  0^  /5  <  0+  2  will 
be  on  except  when  they  receive  an  inhibitory  signal  from  both  R-units  simul¬ 
taneously.  The  units  for  which  ^  -  Q  -  l  must  receive  a  net  excitatory 
signal  from  one  or  both  of  the  R-units  in  order  to  go  on,  and  those  units  for 
which  ^  ~  Q-2,  will  only  go  on  (in  the  presence  of  the  given  stimulus)  if 
they  receive  an  excitatory  feedback  signal  from  both  R-units  at  once.  Units 
for  which  A  L  Q-2  will  never  respond  to  this  stimulus.  The  magnitudes 
of  these  sets  can  be  calculated  from  tables  of  Q-functions  (c.f..  Chapter  6 
and  Reference  87).  The  shaded  area  in  Figure  55  shows  the  sets  which 
respond  to  the  given  stimulus  when  (r*  ,  )  =  (I  ^  I ) 
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Figure  55  EFFECT  OF  FEEDBACK  ON  ACTIVITY  OF  A-SET,  IN  RESPONSE  TO  A  GIVEN 
STIMULUS,  FOR  PERCEPTRON  WITH  2  R-UNITS.  SHADING  SHOWS  ACTIVE 
A-SETS  FOR  THE  RESPONSE  STATE  r!  .  r/  -  fl.l). 


Now  suppose  there  are  two  stimuli,  and  5^  ■  S,  is 

trained  to  give  the  response  combination  >0)  <  while  5^  is 

associated  to  the  response  code  (O,  / )  We  assume  that  the  retinal  sets 
representing  the  two  stimuli  are  completely  disjoint.  Having  trained  the 
perceptron,  let  us  now  present  both  stimuli  simultaneously  (i.  e.  ,  a  composite 
image,  ■  is  projected  on  the  retina).  Under  these  conditions,  a 

s eries -coupled  perceptron  might  equally  well  give  the  response  combinations 
{I ,  O)  or  ( 1 1  I )  ■  The  present  system,  however ,  will 

tend  to  respond  either  with  (/,  O)  or  with  (O,  I  )  ■  In  other  words,  it 

will  tend  to  correlate  those  R-states  which  go  with  one  of  the  two  stimuli, 
rather  than  giving  a  partial  response  to  each.  This  can  be  understood  by 
reference  to  Figure  56,  where  the  A-sets  responding  to  each  of  the  two 
stimuli  are  shown.  For  convenience,  the  sets  responding  to  C/  are 
assumed  to  be  disjoint  from  the  sets  responding  to  ,  and  the  diagram 

is  simplified  by  assuming  that  the  set  which  is  active  for  the  composite  5, 
stimulus  (in  the  presence  of  a  given  R-state)  is  equal  to  the  union  of  the  sets 
responding  to  5,  and  5^  alone.  This  last  assumption  is  not  generally 
warranted,  but  the  qualitative  conclusions  reached  will  still  be  correct.  The 
shading  shows  the  reinforced  sets  for  and 

At  the  moment  that  5"^  appears  on  the  retina,  both  R -units 

will  be  off,  so  that  there  is  zero  feedback  to  the  A -system,  and  the  total 
signal  coming  to  each  R-unit  from  the  A-system  will  be  approximately  zero 
(consisting  of  a  positive  signal  from  one  stimulus,  and  an  approximately 
equal  negative  signal  from  the  other  stimulus).  Suppose  initially,  both 
R -units  go  on.  In  this  case,  the  sets  of  A-units  responding  when 
will  become  active,  and  the  total  signal  to  each  R-unit  will  still  be  approxi- 
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Figure  56  A-SETS  RESPONDING  TO  THE  STIMULI  S,  AND  S2,  FOR  THREE  RESPONSE 
CONDITIONS.  SHADED  AREAS  SHOW  REINFORCED  SETS,  AND  DOUBLE 
HATCHING  SHOWS  REINFORCEMENT  WHICH  GENERALIZES  TO  THE 
CONDITION  R*  (1,0). 
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mately  zero,  so  that  the  response  state  is  unstable.  Alternatively,  suppose 
the  R-state  goes  to  (1,0  )  .  In  this  case,  the  signal  to  the  R-units  conies 

from  the  double -hatched  regions  of  the  Venn -diagram  in  Figure  56,  and  the 
5^  set  becomes  "dominant".  If  this  occurs,  the  response  (1,0)  will 
tend  to  remain  stable,  and  may  even  persist  after  the  stimuli  are  removed 
(provided  some  of  the  A-units  have  thresholds  ^  /  ).  Similarly,  if  the 

R-state  goes  to  (0,1)  ,  then  the  set  becomes  dominant,  and  its 

response  will  tend  to  persist. 

If  either  stimulus  has  been  trained  to  give  the  response  (0,0) 
in  the  above  experiment,  the  R-units  will  tend  to  "hang  up"  in  their  initial 
condition,  and  no  other  response  can  ever  occur  to  the  joint  stimulus  5^ 

On  the  other  hand,  it  is  possible  to  produce  an  oscillating  or  cyclical  response 
by  training  a  given  stimulus  to  give  the  response  (1,1)  when  the  present 
response  is  (0  ,  0  )  ,  then  conditioning  the  (1,  1)  set  to  give  the  response 

(  1,  0  ),  conditioning  this  set  to  give  (0,  1  )  ,  and  finally  associating  the 

response  (  0,  0  )to  the  A-set  responding  for  (0,  1  )  •  In  this  case,  as 
long  as  the  stimulus  is  held  on  the  retina,  the  R-units  will  cycle  through  the 
four  responses  in  succession. 

The  important  tendency  which  has  been  demonstrated  for  this 
system  is  a  tendency  to  correlate  the  output  of  the  R-units  so  that  they 
all  apply  to  a  single  stimulus,  when  a  composite  stimulus  occurs  at  the 
retina.  This  now  provides  the  basis  for  the  following  experiment: 
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EXPERIMENT  15: 


Using  a  four-R-unit  perceptron,  and  a  universe  of 
squares  and  triangles  of  equal  area  in  all  positions  on  the  retina, 
train  the  system  to  give  the  responses  (p*  ^  )  =  (l ,  O)  for  a 

triangle,  and  (O,  /)  for  a  square;  (r*  ,  =  0;  0)  ^ 

stimulus  in  the  top  half  of  the  retina,  and  (O,  / )  fc  stimulus 
in  the  bottom  half.  After  training  with  an  error-corj.  .^tion 
procedure,  test  the  response  of  the  perceptron  to  the  stimuli 
5y  -  triangle  in  the  top  half  of  the  field  and  square  in  the  bottom 
half,  and  5^  =  square  in  the  top  half  with  triangle  in  the  bottom 
half. 


In  this  experiment,  the  first  pair  of  responses  are  used  for  square/ 

triangle  discrimination,  and  the  second  pair  for  top/bottom  discrimination. 

For  the  time  being,  assume  that  the  error  correction  procedure  is  modified 

by  forcing  the  correct  R  **  condition  whenever  a  correction  is  applied.  (This 

assumption  will  be  dropped  in  Section  21.2.)  It  is  predicted  that  a  back-coupled 

system,  organized  as  above,  will  tend  to  give  one  of  the  two  responses 

(1,0,  1,  0)  or  (0,  1,  0,  1)  for  stimulus  (signifying  "triangle,  top" 

or  "square,  botto  m" ,  respectively) ,  but  will  give  one  of  the  two  responses 

(1,0,0,  1)  or  (  0,  1,  1,0)  for  stimulus  5^,.  (signifying  "square,  top"  or 

(1 

"triangle,  bottom").  In  other  words,  the  system  should  give  a  consistent 
description  of  one  of  the  two  stimuli,  in  terms  of  shape  and  location,  and 
ignore  the  other  stimulus;  it  will  not  name  the  shape  of  one  and  the  position 
of  the  other,  even  though  both  shapes  and  both  positions  are  simultaneously 
present . 
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That  the  predicted  effects  will  tend  to  occur  can  be  seen  by 
referring  to  Figure  57,  where  it  is  assumed  that  the  combination 

(top  triangle  and  bottom  square)  occurs.  Reinforcement  is  shown  by  cross- 
hatching.  The  relative  sizes  of  the  intersections  in  the  Venn  diagram  are 
drawn  to  suggest  the  relative  intersections  of  the  A-sets  for  the  response 
states  of  interest.  Note  that  the  set  responding  when  f?  =  (l^  0^0,0)  tends 
to  have  a  relatively  large  intersection  with  the  ^  set,  due  to  the 

fact  that  three  of  the  four  R-units  are  in  identical  states.  The  combined 
intersection  of  the  {l  ^  0 .  0 ,  O)  set  with  the  sets  which  are  reinforced  to 
yield  the  "top"  response  {i,0)  on  and  P^  is  greater  than  the  combined 

intersection  with  the  sets  which  were  reinforced  for  the  "bottom"  response. 

If  the  triangle  first  becomes  dominant  with  respect  to  the  P^  ,  pair  of 

responses  (yielding  the  condition  l.0,0,0)  the  activated  set  which  has 
been  most  heavily  reinforced,  shown  by  cross-hatching,  will  now  tend  to 
evoke  the  "top"  response  from  P^  and  ,  since  the  "top  triangle"  set  now 
carries  considerably  greater  weight  than  the  "bottom  square"  set.  Thus  a 
consistent  configuration  on  all  four  R-units  is  induced.  If  (0,  I  ^  0,  O)  should 
occur,  however,  the  system  will  have  an  opposite  bias  for  P^  and  ,  tending 

to  evoke  the  condition  (O,  I  .  0,  l)  .  If  5.^,,  should  occur  instead  of  ,  the 

biases  will  be  found  to  favor  tlie  (/,  0.0^  I )  or  [0,  /  ,  1,0)  conditions,  as 
predicted . 

Experiment  15  illu5trate.g  tlie  simplest  conditions  under  which 
"selective  attention"  might  be  said  to  occur  in  a  perceptron.  In  a  complex 
field,  with  more  than  one  trained  stimulus  present,  rather  than  giving  a 
conflicting  mixture  of  responses,  the  perceptron  tends  to  pick  a  single 
familiar  "object"  and  respond  to  this  object  to  the  exclusion  of  everything 
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Figure  57  SETS  AFFECTING  THE  TRANSITION  FROM  THE  RESPONSE  STATE  (I, 0,0,0) 

WHEN  THE  COMBINED  STIMULUS  "TOP  TRIANGLE"  AND  "BOTTOM  SQUARE"  OCCURS. 
SHADING  SHOWS  REINFORCED  SETS,  AND  THE  MEASURES  OF  THE  INTERSECTIONS 
WITH  THE  (I, 0,0,0)  SETS  ARE  DENOTED  BY  THE  LETTERS  a,  b,  c,  AND  d. 

THE  VENN  DIAGRAM  IS  DRAWN  SO  AS  TO  EMPHASIZE  THE  PROBABLE  MAGNITUDES 
INVOLVED. 
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else.  By  adding  additional  responses,  a  complete  description  might  be 
obtained  of  the  shape,  size,  position,  etc,,  of  a  single  object  in  the  field. 

The  particular  object  which  is  selected,  however,  depends  on  chance  factors, 
such  as  the  relative  amounts  of  reinforcement  which  have  been  applied  to 
different  A-sets,  or  momentary  noise  within  the  network.  In  the  following 
section,  it  will  be  shown  how  a  stimulus  in  a  different  modality,  such  as  a 
spoken  word,  can  be  made  to  direct  the  attention  of  the  perceptron  towards  a 
selected  object  or  region  in  the  visual  field. 

21.1.2  Dual  Modalitv  Input  Systems 


The  perceptron  which  is  illustrated  in  Figure  58  is  similar 
to  the  one  which  was  described  in  the  preceding  section,  except  that  it 
possesses  two  sensory  input  systems,  one  visual  (a  retina)  and  the  other 
auditory  (e.g,  ,  a  filter  system).  There  is  a  set  of  A-units  for  each  of  these 
input  sets,  designated  for  the  visual  association  system,  and  for 

the  auditory  association  system.  Again,  there  are  four  R-units,  each  one 
receiving  variable -valued  connections  from  all  A-units  in  both  sets,  and 
sending  a  set  of  fixed  value  connections  back  to  all  the  A-units,  As  before, 
half  of  the  feedback  connections  from  each  R-unit  are  assumed  to  be  excitatory, 
and  the  remainder  inhibitory,  with  values  1  / 
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Figure  58  ORGANIZATION  OF  A  DUAL  MODALITY  ^CRCEPTRON,  WITH  4  R-UNITS 
(BROKEN  LINES  INDICATE  VARIABLE-VALUED  CONNECTIONS) 


With  this  system,  the  following  experiment  can  be  performed: 

EXPERIMENT  16:  Using  a  dual-modality  input  system  (visual  and 

auditory),  with  four  R-units,  train  the  perceptron  to  distinguish 
square /triangle  and  top/bottom,  using  the  same  code  and 
stimuli  as  in  Experiment  15.  Then,  selecting  four  discriminable 
audio-patterns,  SQ,  TR.,  T,  and  B,  train  the  perceptron  by 
means  of  the  audio-input  to  associate  the  responses  for  "square". 


"triangle",  "top"  and  "bottom"  to  these  four  stimuli.  In  testing 
the  perceptron,  a  composite  visual  stimulus,  consisting  of  a 
triangle  in  the  top  half  of  the  field  and  a  square  in  the  bottom 
half,  is  used.  Simultaneously  with  the  visual  input,  the  audio¬ 
pattern  SQ,  TR,  T,  or  B  is  presented,  and  the  response  of  the 
perceptron  is  observed  for  each  of  these  four  conditions. 

From  the  discussion  of  Experiment  15,  it  is  clear  that  the 
visual  section  of  the  perceptron  will  tend  to  give  a  consistent  response  of 
(1,0, 1,0)  or  (0,1, 0,1)  ,  representing  "top  triangle"  or  "bottom  square"  , 

respectively.  The  effect  of  adding  the  audio -stimuli  is  to  add  an  additional 
bias  to  the  R-units,  favoring  one  of  the  four  "concepts",  square,  triangle,  top, 
or  bottom.  For  example,  if  the  TR  stimulus  is  applied  (which  has  been 

^  ‘K’ 

independently  associated  to  the  composite  response  ~  ^  there 

will  be  an  auxiliary  positive  signal  to  P,  ,  and  an  inhibitory  signal  to  r , 
coming  from  the  set.  There  will  be  no  bias  introduced  on  Pj  and 

Consequently,  the  system  will  be  biased  to  give  the  initial  response 
(1,0, 0,0  )  ,  which  we  have  seen  tends  to  transform  itself  into  the  stable 

condition  (1,  0,  1,  0)  for  the  given  stimulus. 

Thus  the  results  which  are  predicted  for  Experiment  16  are  that 
when  the  audio -pattern  TR  is  given,  the  perceptron  will  give  the  composite 
response  indicating  the  shape  and  position  of  the  triangle;  when  SQ  is 
presented,  the  perceptron  will  indicate  the  shape  and  position  of  the  square; 
for  the  audio-input  x  ,  it  will  indicate  the  shape  and  location  of  the  top 
visual  pattern;  and  for  B  .  it  will  indicate  the  shape  and  location  of  the 
bottom  pattern.  An  audio -command  can  therefore  be  used  to  direct  the 
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attention  of  the  visual  system  to  a  specified  location  or  a  specified  shape, 
and  the  output  of  the  perceptron  will  be  a  consistent  description  of  the  indi¬ 
cated  object. 


While  it  is  possible  by  means  of  the  above  procedure  to  assign 
"names"  to  visual  objects  or  events,  and  direct  the  attention  of  the  perceptron 
by  means  of  these  names,  it  should  be  noted  that  the  association  is  actually 
much  too  complete  for  this  to  serve  as  a  model  for  linguistic  "naming  behavior". 
For  the  perceptron,  there  is  no  difference  (at  the  response  level)  between  the 
name  for  an  object  and  the  object  itself.  Thus  the  audio -symbol  TR  and  the 
visual  image  of  a  triangle  both  turn  on  the  same  response  combination 
(1,0,.,)  in  the  experiment  considered  above .  If  it  is  desired  to  retrain 
the  system  to  associate  some  other  visual  pattern  (say,  "trapezoid")  with 
the  TR  symbol,  it  is  necessary  to  completely  eliminate  the  previous  asso¬ 
ciation  of  triangles  to  (1,0,  .  .  )  and  train  trapezoids  to  give  this  response 
instead.  Words  and  visual  patterns  are  part  of  the  same  conceptual  class,  for 
this  perceptron,  and  cannot  be  re-associated  as  distinct  entities,  but  can  only 
be  used  as  raw  material  for  building  up  new  conceptual  classes.  The  distinction 
between  the  name  and  the  visual  object  becomes  important  in  practice  if  we 
wish  to  tell  the  perceptron  to  "look  for  the  square"  when  there  is  no  visual 
square  present.  The  audio-symbol  "look"  might  be  used  to  start  an  auto¬ 
matic  scan  or  hunting  process,  but  to  stop  the  process  when  a  square  is 
found,  the  perceptron  must  be  capable  of  distinguishing  between  the  audio¬ 
symbol  for  "square"  (which  it  must  remember  for  the  duration  of  the  search 
process  to  tell  it  what  it  is  looking  for)  and  the  visual  pattern  of  a  "square", 
which  must  stop  the  search  when  it  appears.  A  perceptron  which  is  capable 
of  distinguishing  between  symbols  and  objects,  and  is  not  subject  to  these 
criticisms,  will  be  considered  in  Section  Z1  .3. 
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21,  Z  Three -Layer  Systems  With  Variable  R-A  Connections 


In  the  previous  examples,  the  existence  of  a  bias  towards  one 
of  the  two  consistent  response  configurations  when  part  of  the  state  is 
achieved,  is  due  to  the  fact  that  reinforcement  is  applied  only  in  the  presence 
of  the  correct  response.  This  m.eans  that  whenever  a  corrective  reinforce¬ 
ment  is  applied,  the  reinforcement  control  system  must  first  "force"  the 
desired  response  configuration.  But  in  a  simple  error -correction  procedure, 
as  this  concept  has  been  used  previously,  the  corrective  reinforcement  would 
normally  be  applied  only  when  the  response  is  wrong,  and  this  would  tend  to 
reduce  the  indicated  bias  quite  drastically.  For  example,  in  Figure  56,  it 
can  be  seen  that  if  had  been  negatively  reinforced  in  the  presence  of  the 
R  =  (l ,  O)  state,  this  negative  reinforcement  would  tend  to  cancel  the  effect 
of  the  S,  signal.  One  method  of  eliminating  this  problem,  which  leads  to  a 
system  which  appears  to  be  generally  better -behaved  (on  the  basis  of  a  quali¬ 
tative  examination  of  its  properties)  is  to  make  use  of  adaptive  back-conne'ctions, 
rather  than  fixed-value  connections,  from  the  R  to  A-units. 

21.2.1  Fixed  Threshold  Systems 


The  first  model  to  be  considered  corresponds  topologically  to  the 
model  treated  in  Section  21.1.1,  but  differs  in  having  variable  connections, 
so  that  its  symbolic  diagram  is  of  the  form: 


0 
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The  forward  connections,  from  A  to  R-units,  are  assumed  to  follow  the 
usual  oc  -system  dynamics,  subject  to  error -correction  procedures.  The 
back-connections,  however,  are  subject  to  the  P -system  rule  which  was 
introduced  for  cross -coupled  perceptrons.  This  means  that  the  total  value 
of  the  set  of  feedback  connections  from  each  R-unit  remains  constant,  but 
that  if  both  termiini  (the  R-unit  and  the  A-unit)  are  active  in  succession,  the 
connection  value  is  incremented  by  a  positive  quantity,  .  At  the  same 

time,  a  proportional  decay  occurs  in  all  active  R-A  connections,  so  that  in 
the  absence  of  reinforcements,  they  tend  to  approach  zero  exponentially.  The 
net  change  in  value  of  connection  at  time  t  is  therefore 


A 
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V  -  /  -  r 
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A  Z  it) 


(21.1) 


Assuming,  as  before,  that  each  stimulus  persists  for  a  time  T  »  T"  ,  the 
result  of  this  rule  is  to  raise  the  value  of  the  feedback  signal  to  all  S-units 
which  respond  to  the  current  stimulus,  from  the  active  R-units,  and  at  the 


Note  that  in  this  equation  decay  occurs  only  when  r *  =  I  .  This 
means  that  the  feedback  signals  from  different  R-units  will  have 
approximately  equal  weight,  regardless  of  the  relative  frequency 
with  which  the  R-units  are  used.  The  transmission  delay,  T  , 
is  included  only  for  conformity  to  previous  models,  and  plays  no 
essential  role  here . 
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same  time  to  develop  inhibitory  connections  to  the  A -units  which  are  not 
currently  active.  The  decay  guarantees  that  the  entire  system  will  tend 
towards  a  dynamic  equilibrium,  at  which  the  expected  rate  of  gain  just 
balances  the  rate  of  decay. 

The  effect  of  this  system  is  illustrated  in  Figure  59,  which  shows 
the  condition  after  associating  stimulus  S,  to  the  response  (1 , 0)  and 
to  the  response  (0.  1),  by  an  error  correction  procedure.  This  corresponds 
to  the  sam.e  conditions  as  Figure  56.  The  sets  which  respond  when 

=  '6  .  .  ,/  are  shown  by  the  large  circles.  If  these  sets  are  initially 

reinforced  to  yield  the  appropriate  response  for  each  stimulus,  then  when  the 
composite  stimulus  appears,  they  will  try  to  turn  on  opposite  responses,  with 
about  equal  strength.  Such  a  condition,  however,  will  be  an  unstable  one.  If 
one  of  the  sets,  say  5^  ,  carries  slightly  greater  weight  than  the  other, 

the  condition  illustrated  in  the  figure  will  arise.  With  on,  excitatory 


Figure  59  A-SETS  RESPONDING  TO  THE  COMPOSITE  STIMULUS  SHADING 

SHOWS  ACTIVE  A-SETS  FOR  THE  RESPONSE  STATE  (1,0). 

(COMPARE  Figure  56). 
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signals  will  be  transmitted  back  to  the  5,  set,  and  inhibitory  signals  to 
all  other  A -units,  including  the  set.  Thus  the  S,  set  remains 

unchanged,  but  the  5^  set  is  diminished .  Alternatively,  if  should 

gain  an  advantage,  the  5^^^  set  will  tend  to  remain  unchanged,  and  the  5, 
set  will  be  reduced. 

If  we  assume  that  the  universe  consists  of  a  large  number  of 
stimuli  in  each  class,  as  in  Experiments  15  and  16,  the  set  of  A-units 
responding  to  5^  would  generally  not  be  perfectly  preserved,  but  would 
be  shifted  to  include  more  units  which  respond  to  many  stimuli  in  the 
class,  and  to  eliminate  those  units  which  respond  only  to  5^  .  Thus 

there  is  an  additional  tendency,  in  this  system,  to  convert  the  sets  of 
A-units  for  different  stimuli  which  have  been  associated  to  the  same  response, 
to  sets  which  are  nearly  identical.  It  is  clear  that  if  the  procedures  of 
Experiments  15  and  16  are  carried  out  with  this  system  (but  with  the  usual 
error -cor rec tion  practice  of  reinforcing  in  the  presence  of  the  wrong 
responses  only,  rather  than  forcing  the  correct  response)  the  results  predicted 
in  Section  Z1  .  1  will  be  obtained,  but  with  less  chance  of  confusion  or 
erroneous  bias  due  to  conflicting  active  sets.  The  special  property  of  the 
variable  feedback  system  can  be  characterized  as  a  tendency  to  activate  the 
A-units  responding  to  one  of  the  previously  trained  parts  of  a  complex 
stimulus,  while  suppressing  those  A-units  which  respond  to  the  remaining 
parts . 
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Servo -Controlled  Threshold  Systems 


In  all  perceptrons  considered  thus  far,  the  thresholds  of  the 
A -units  have  been  assumed  to  be  invariant  over  time.  It  is  possible  to  vary 
the  effective  threshold  of  an  A-unit  by  adding  an  excitatory  or  inhibitory 
component  to  its  input  signal.  If  this  is  done  for  all  A-units  in  the  system, 
the  result  will  be  to  increase  or  decrease  the  proportion  of  units  which 
respond  to  a  given  stimulus.  If  all  signals  and  thresholds  are  quantized,  then 
the  change  in  the  active  set  will  occur  by  sudden  jumps;  for  example,  the 
addition  of  A  0  =  +  I  will  suddently  activate  all  A-units  whose  oC  -signal 
was  equal  to  6^-1  .  Such  a  condition  would  be  hard  to  utilize  effectively 

for  the  control  of  activity.  On  the  other  hand,  if  each  A-unit  has  a  threshold 
9^  selected  at  random  from  some  continuous  distribution,  say  a  Gaussian 
distribution,  then  there  will  always  be  some  A-units  whose  thresholds 
are  just  below  the  present  value  oi  oc  ^  ,  and  others  whose  thresholds  are 

just  above  the  present  value  of  oc^.  .  In  this  case,  a  slight  change  in  0 
will  always  yield  a  corresponding  change  in  the  size  of  the  active  A-set,  and 
the  size  of  the  active  set  will  vary  in  an  approximately  continuous  fashion 
as  0  is  changed  continuously. 

Figure  60  shows  a  back-coupled  perceptron  in  which  the  amount 
of  activity  is  continuously  monitored  by  a  servomechanism,  which  controls 
the  magnitude  of  the  thresholds  so  as  to  keep  the  total  activity  constant. 

If  the  fraction  of  active  units  falls  below  the  desired  level,  the  servo-system 
transmits  an  excitatory  signal  to  all  A-units  (equivalent  to  A  0  <  0  )  while 
if  the  activity  rises  above  the  desired  level,  an  inhibitory  signal  (equivalent 
to  A  0  >  0  )  is  transmitted  to  all  A-units. 
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9  -SERVO 
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Figure  60  BACK-COUPLED  PERCEPTRON  WITH  SERVO-CONTROLLED  THRESHOLDS. 


Such  a  system  is  likely  to  have  advantages  in  many  types  of 
perceptrons.  Attached  to  a  series -coupled  perceptron,  for  example,  the 
-servo  can  guarantee  that  regardless  of  stimulus  size  or  intensity,  the 
level  of  A-unit  activity  will  be  optimum.  In  a  cross -coupled  system,  it  can 
be  used  to  prevent  "blow-ups"  of  activity,  by  providing  an  active  mechanism 
for  counterbalancing  the  growth  of  excitatory  weights.  It  is  worth  noting 
that  the  -  -servo  can  substitute  for  inhibitory  connections  from  the  retina 


to  A-units,  since  it  generally  yields  the  condition  that  if  stimulus 


is 


a  subset  of  stimulus 
elation  set  A  ‘  Z 


S ,,  (on  the  retina),  the  corresponding  active  asso- 
will  not  be  a  subset  of  A  [S,./  ■  In  the  back-coupled 


system,  the  G  -servo  yields  particularly  interesting  results. 


Figure  61(a)  shows  the  condition  of  the  A-set  for  the  same  stimuli 
as  in  Figure  59,  with  the  R-units  in  the  (0.0)  state,  so  that  there  is  no  feed¬ 
back.  The  large  circles  show  the  sets  which  respond  to  5,  and  5.  alone, 
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normalized  by  the  action  of  the  servomechanism.  When  the  composite 
stimulus  appears,  it  is  no  longer  possible  for  the  union  of  the  sets  AfS,) 
and  to  remain  active,  however;  consequently  the  active  sets 

are  reduced  to  those  units  (shown  by  the  shaded  areas  of  the  diagram)  for 
which  -t-  A  0  .  Under  these  conditions  there  is  still  no  bias 

favbring  the  S,  response  or  the  S2  response;  both  sets  are  still  in 
balance,  and  either  response  might  occur.  As  before,  however,  this  condi¬ 
tion  tends  to  be  unstable,  and  (assuming  that  5^  and  5^  have  been 
associated  to  the  same  response  codes  as  previously)  either  (1,0)  or 
(0,  1)  will  tend  to  occur. 

Figure  61(b)  shows  the  stable  state  of  the  system  in  which  the 
response  (1,0)  has  become  dominant.  The  servo-system  is  now  obliged  to 
adjust  to  the  effect  of  the  excitatory  signal  fed  back  to  the  ACS,)  set,  and 
the  inhibitory  signal  co  the  A(5,,)  set.  The  result  is  that  the  active  set  is 
nearly  identical  to  the  set  which  would  be  active  for  5,  alone,  the  A{S^) 
set  being  virtually  obliterated  by  the  combined  effect  of  the  negative 
feedback  and  the  increased  threshold.  It  seems  likely  that  by  strengthen¬ 
ing  the  excitatory  feedback  component  (  in  the  diagram)  sufficiently, 

the  active  set  can  be  made  to  coincide  perfectly  with  the  set  responding  to 
Si  alone.  Thus  the  effect  of  selecting  the  (1,0)  response  configuration  is  to 
enable  the  perceptron  to  respond  exclusively  to  the  5,  stimulu  s , completely 
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(a)  ACTIVITY  STATE  FOR  R’=(0,0) 


0-SERVO 


Figure  61  ACTIVE  A-SETS  FOR  COMPOSITE  S,  STIMULUS,  IN  SERVO-CONTROLLED 
BACK-COUPLED  SYSTEM.  ACTIVE  SETS  SHOWN  BY  SHADED  AREAS. 


free  from  interference  by  the  presence  of  ,  fleversal  of  the  state 

would,  of  course,  lead  to  a  reversal  of  the  A-state,  These  phenomena  are 
highly  suggestive  of  reversible  perspective  and  figure -ground  reversal  in 
psychological  experiments,  where  one  of  two  ways  of  perceiving  a  complex 
figure  dominates  to  the  exclusion  of  the  other. 

In  a  dual-modality  perceptron,  the  above  system  will  work  in  a 
similar  fashion,  assuming  that  separate  0- servos  are  employed  for  the 
visual  and  auditory  channels.  Thus  by  giving  the  audio  symbol  for  square 
or  triangle,  top  or  bottom,  in  Experiment  16,  the  perceptron  can  be  directed 
to  attend  to  one  of  the  two  objects  present,  and  will  develop  an  A-unit  state 
which  corresponds  closely  to  the  state  which  would  be  expected  if  only  the 
indicated  object  was  present  in  the  field. 

21.3  Linguistic  Concept  Association  in  a  Four -Layer  Perceptron 

In  Section  21.1.2,  it  was  noted  that  although  names  can  be 
associated  to  objects  or  visual  events  in  a  three-layer  back-coupled  model, 
so  as  to  permit  the  experimenter  to  direct  the  attention  of  the  perceptron 
selectively  to  a  named  object  in  a  compound  field  of  stimuli,  the  associations 
formed  tend  to  be  associations  of  particular  stimuli,  rather  than  universals. 
It  is  not  possible  to  change  the  name  of  an  object  (or  a  class  of  objects) 
without  actually  undoing  the  previous  perceptual  organization  of  the  stimulus 
world  for  the  given  perceptron,  and  then  reconstructing  it  in  a  new  form. 
Words  and  visual  patterns  are  not  distinguished,  at  the  response  level,  but 
are  amalgandated  into  a  common  concept. 


A  perceptron  which  is  capable  of  first  forming  auditory  and 
visual  concepts,  or  universals,  and  then  associating  these  with  one  another, 
and  which  can  change  its  "linguistic  associations"  without  disrupting  its 
perceptual  organisation,  is  illustrated  in  Figure  62.  The  system  has  a 
visual  input  and  an  audio -input,  as  in  Figure  58.  It  is  also  equipped  with 
a  -servo,  and  the  back  connections  to  the  /  set  are  variable,  as  in 
Section  21.2.  For  present  purposes  no  back-connections  to  the  ^  set  are 
required.  There  are  two  distinct  sets  of  R-units:  one  set,  ,  receives 

its  primary  inputs  from  the  system,  and  can  be  associated  to  visual 

stimuli.  The  second  set,  p"  ,  receives  its  primary  inputs  from  the  audio¬ 
system,  and  can  be  trained  to  respond  to  sound  patterns,  or  words.  (By  using 
a  spectrum  of  for  the  5,,  to  A  ,  connections,  or  by  means  of  a 

cross -coupled  A  ,  -set,  the  system  can  be  taught  to  recognize  sound 
sequences,  so  that  it  need  not  be  restricted  to  momentary  sound  patterns.) 


Figure  62  A  DUAL-MODALITY  PERCEPTRON  FOR  LINGUISTIC  CONCEPT  ASSOCIATION. 
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Thus  far,  we  have  what  amounts  to  two  mutually  independent 

perceptrons,  one  for  visual  stimuli,  and  the  other  for  auditory  stimuli. 

Each  of  these  perceptrons  can  form  classes  and  generalizations  .  by  means  of 

an  error -correction  procedure  applied  to  the  appropriate  response  sets. 

The  added  feature,  however,  is  the  extra  association  layer,  which,  in  this 

system,  comes  after  the  R-units.  The  A-units  in  this  set  receive  fixed 

connections  from  the  R-units  (which  f\orm  a  sort  of  retina  for  a  second-order 

perceptron)  and  send  back  variable-valued  cc  -system  connections  to  the 

R-units.  It  is  assumed  that  each  R-unit  (in  both  sets)  receives  connections 
(2) 

from  all  of  the  »  units,  and  that  the  values  of  these  connections  can  be 
corrected  by  an  error -correction  procedure,  just  as  with  the  connections 
from  the  layer. 

Suppose  the  perceptron  has  already  been  trained  to  recognize 
several  kinds  of  visual  objects  (say  squares  and  triangles)  and  has  also  been 
trained  to  recognize  several  spoken  words  ("square"  and  "triangle")  for  a 
variety  of  intonations,  voice  qualities,  etc.  During  this  training,  the  A. 
to  R-unit  back-connections  have  not  been  reinforced.  Now  let  the  perceptron 
hear  the  word  "triangle",  without  any  visual  stimulus  being  present.  The 
result  will  be  an  appropriate  code -configuration  in  the  units,  which 

will  induce  a  characteristic  state  of  the  A  system  ,  identifying  the 
spoken  word.  By  means  of  an  error  correction  procedure,  the  perceptron 
can  now  be  biased  to  give  the  code  for  a  triangle,  and  will  hereafter 

tend  to  prefer  this  response  to  any  others  when  the  word  "triangle"  occurs. 
Consequently,  when  a  composite  stimulus  is  presented,  as  in  Experiment  16, 
together  with  the  spoken  word  "triangle",  the  system  will  tend  to  give  the 
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'If" 

R  response  to  the  triangle,  and  due  to  the  feedback  connections  to  the 
set,  and  the  action  of  the  0  -servo,  it  will  selectively  augment  the  inputs  to 
those  A-units  which  respond  to  the  triangle,  while  tending  to  suppress 
activity  of  A-units  responding  to  other  stimuli.  Since  all  idiosynchratic  forms 
of  the  spoken  word,  and  all  forms  of  the  triangle -pattern,  have  been  asso¬ 
ciated  to  identical  response  codes,  the  association  will  generalize  immediately 
over  both  the  audio  class  and  the  visual  class  of  stimuli,  without  having  to 
train  the  system  with  multiple  examples  of  each. 

Thus  the  four -layer  perceptron  can  be  made  to  direct  its 
attention  in  response  to  spoken  commands  in  much  the  same  way  as  the 
previous  models,  but  without  requiring  a  modification  of  the  A-R  connections, 
or  "perceptual  organization"  of  the  network,  in  forming  the  linguistic  asso¬ 
ciation.  By  a  similar  procedure,  the  to  R^  connections  can  be 

reinforced  in  the  presence  of  a  visual  pattern  to  create  a  bias,  or  "expentancy" , 

favoring  the  perception  of  the  word  corresponding  to  the  perceived  object.  By 

iz) 

replacing  the  oc  -system  back-connections  from  A  to  the  R-units  with 

r  -system  connections  (as  in  Equation  Zl.l)the  association  can  be  made 

to  occur  in  a  relatively  spontaneous  fashion,  by  presenting  the  visual  image 

together  with  its  spoken  name.  The  result  will  be  a  reinforcement  of  the 

(z) 

connections  from  the  A  set  which  responds  jointly  to  the  visual  and 
auditory  codes;  since  this  set  will  have  many  units  in  common  with  the 
separate  audio  and  visual  A  sets,  the  reinforcement  will  tend  to 
generalize,  to  yield  the  desired  result. 
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1 1 

This  system  can  be  used  for  the  problem  of  searching  for  a 
named  object  which  is  not  currently  present  in  the  visual  field.  For  this 
task,  one  must  assume  that  the  units  are  of  a  "flip-flop"  variety, 

which  tend  to  go  on  and  stay  on  when,' they  receive  a  sufficient  input  signal, 
until  they  are  specifically  cut  off  by  a  strong  inhibitory  signal.  The  system 
is  taught  to  initiate  an  automatically  controlled  search  or  scan  procedure  in 
response  to  the  spoken  word  "search".  It  is  also  trained  (at  the  /A  level) 
to  turn  off  the  search  response  whenever  a  coincidence  occurs  between  a 
spoken  name -code,  and  the  visual  object-code,,  but  to  leave  the  search-state 
alone  when  either  the  name  or  object,  but  not  both,  are  present.  Thus,  given 
the  command  "Search  for  square",  the  word  "search"  initiates  the  search 
activity,  and  the  word  "square"  sets  the  system  to  anticipate  a  square  pattern. 
When  a  square  appears  in  the  field,  the  set  corresponding  to  the  com¬ 

bined  object-code  and  word-code  is  activated,  and  transmits  a  strong  inhi¬ 
bitory  signal  to  the  search  response,  turning  it  off.  It  would  be  possible  to  go 
a  step  farther,  by  training  the  perceptron  (which  has  now  isolated  the  set  of 
units  responding  to  the  square)  to  continuously  center  the  image  of  the 
square  in  the  retina,  using  two  continuous  R -units  to  measure  -pi  and  -y 
displacements  of  the  image  from  the  center  of  the  field  (as  in  Section  10.2). 
Such  a  system,  having  found  a  moving  stimulus,  will  track  it  and  tend  to 
keep  it  centered  without  being  confused  by  the  presence  of  e.xtraneous  objects 
in  the  field . 
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22, 


PROGRAM -LEARNING  PERCEPTRONS 


In  the  last  chapter,  we  have  seen  that  a  back-coupled  perceptron 
can  be  made  to  attend  selectively  to  parts  of  a  complex  field,  suppressing 
A -unit  activity  corresponding  to  objects  other  than  the  one  attended  to.  In 
the  last  few  paragraphs,  it  was  also  shown  that  such  a  perceptron  can  be 
made  to  anticipate  decisions  which  are  to  be  made  at  a  future  time,  and 
execute  them  when  the  appropriate  perceptual  conditions  are  met.  This 
lays  the  basis  for  the  learning  of  sequential  programs  of  responses  in 
perceptrons . 


Programmed  activity  is,  of  course,  of  supreme  importance  in 
carrying  out  logical  sequences  or  algorithms,  as  in  a  digital  computer.  It 
also  appears  to  provide  a  possible  basis  for  the  recognition  of  highly  complex 
stimulus  configurations,  which  depend  on  relations  of  simpler  parts,  rather 
than  a  fixed  overall  shape.  The  recognition  of  a  human  form,  or  an  animal, 
is  of  this  variety.  It  is  also  possible  that  the  recognition  of  abstract  topo¬ 
logical  relations  --  a  problem  which  has  hitherto  defied  all  perceptrons 
analyzed  --  can  be  performed  by  means  of  a  suitable  programmed  sequence 
of  observations.  This  writer  has  become  increasingly  convinced  that  a 
passive  filter-type  system  (such  as  a  simple  perceptron)  cannot  be  designed 
which  will  economically  recognize  topological  abstractions  and  relations 
such  as  "A  and  B  are  disjoint"  or  "A  is  inside  B"  or  "A  is  a  closed  curve". 

On  the  other  hand,  a  perceptron  which  can  attend  selectively  to  part  of  the 
stimulus  pattern  at  a  time,  and  carry  out  a  sequence  of  observations  under 
program-control,  seems  to  offer  a  potential  solution  to  this  problem. 
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22.  1  Learning  Fixed  Response  Sequences 


A  perceptron  of  the  back-coupled  or  cross-coupled  variety  can 
be  taught  to  execute  a  fixed,  stereotyped  sequence  of  responses  without 
introducing  any  new  features  in  the  system.  If  the  sequence  R*,  ,  ’  ^3 

is  required,  for  example,  when  stimulus  S,  occurs,  but  the  inverse 
sequence  (  ^  F?^  j  )  when  5^  occurs,  it  is  only  necessary  to 

associate  the  required  responses  to  the  succession  of  A-states  which 
follow  the  stimulus  in  the  cross -coupled  system,  or  to  the  A-states  which 
result  from  the  interaction  of  the  retinal  input  and  the  R-A  feedback,  in  the 
back-coupled  system.  Of  these  two  approaches,  the  cros s -coupled  system  is 
more  versatile,  since  it  can  be  triggered  by  a  momentary  stimulus,  and  will 
not  return  to  an  identical  state  if  the  same  response  condition  should  occur 
at  different  points  in  the  sequence.  The  cross -coupled  system,  however, 
requires  that  the  response  sequence  occur  with  exact  timing  of  each  element. 

If  the  triggering  or  execution  of  each  response  takes  an  indeterminate  amount 
of  time,,  then  a  closed-loop  system  of  the  type  shown  in  Figure  63  would  be 
more  appropriate.  This  system  (which  is  also  applicable  to  the  recognition 
of  strings  of  sensory  events,  such  as  words  or  speech  sounds,  where  each 
element  of  the  sequence  is  of  indeterminate  duration)  employs  an  A  system 
with  units  which  tend  to  lock  on  once  they  are  activated,  unless  specifically 
triggered.  These  units  are  of  the  same  variety  as  the  "flip-flop  R-units" 
employed  in  the  P set  in  Section  21.3.  The  set  is  cros s -coupled, 

with  fixed  Conner  tions,  and  feeds  back  (with  fixed  connections)  to  the  A 
set . 
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Figure  63  FOUR-LAYER  PERCEPTRON  FOR  RECOGNITION  AND  CONTROL  OF  ^-SEQUENCES 
WITH  ELEMENTS  OF  INDETERMINATE  DURATION. 


When  a  response  occurs  in  the  R-set,  it  immediately  triggers 
the  /A  system  to  assume  some  characteristic  state.  The  parameters 
of  the  cross -coupling  at  the  A  level  can  be  so  picked  (e .  g .  ,  by  making 
all  interconnections  inhibitory)  that  the  system  will  immediately  assume  a 
steady  state,  which  will  be  held  until  some  subsequent  response  occurs. 
When  the  second  response  of  the  sequence  occurs,  it  finds  the  effective 
thresholds  of  the  A"  units  modified  by  the  cross -coupling  signals  from 
the  units  which  are  already  on.  Consequently,  the  state  which  occurs 

will  depend  not  only  on  the  new  response,  but  also  on  the  previous  A" 
state.  Unlike  the  previous  cross --coupled  systems,  however,  it  does  not 
depend  on  the  time-lapse  since  the  previous  input,  since  the  A‘'  state 
has  held  steady  over  the  interval. 
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state 


By  means  of  the  feedback  to  the  set,  the 

(and  consequently  the  response  sequence)  can  be  made  to  modify  the  response 
of  the  A^*'  system  to  the  present  stimulus.  Thus  a  distinct  succession 
of  responses  can  be  associated  to  the  stimulus,  each  new  state 

signifying  the  joint  information  that  the  stimulus  is  present,  and  that  a 
particular  succession  of  responses  has  occurred  in  the  past.  To  terminate 
such  a  sequence,  it  is  possible  to  assume  that  one  of  the  R-units  has  inhibi- 
tory  connections  to  all  A  units,  so  that  when  the  end  of  the  sequence 
is  recognized,  the  A  system  can  be  reset  to  its  inactive  state,  by 
turning  on  this  response. 

22.2  Conditional  Response  Sequences 

In  the  last  section,  the  response  sequences  learned  by  the 
perceptron  were  assumed  to  be  of  a  fixed,  stereotyped  variety,  such  as 
the  utterance  of  a  given  word  or  phrase,  or  the  execution  of  a  particular 
sequence  of  movements.  Of  more  general  interest,  is  the  possibility  of 
conditional  response  sequences,  where  the  e.xecution  of  the  next  step 
depends  upon  the  realiz.ation  of  a  set  of  conditions  at  the  present  time. 

In  a  limited  sense,  we  have  already  demonstrated  the  possibi¬ 
lity  of  conditional  responses  in  th.e  perceptron  of  Figure  63,  where  the 
next  response  depends  not  only  noon  the  preceding  R.-sequence,  but  also 
upon  the  continuation  of  the  initiating  stimulus.  A  more  interesting  case, 
however,  would  be  one  in  which  the  next  response  depends  upon  the  recogni¬ 
tion  of  some  condition  which  results  from  the  preceding  activity  of  the 
perceptron  itself.  For  example,  if  the  perceptron  is  equipped  with  a  move- 
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able  appendage  by  means  of  which  it  can  apply  pressure  to  external  objects, 
we  might  ask  it  to  push  aside  any  object  placed  in  front  of  it.  Such  objects 
might  have  their  movement  blocked,  either  to  the  right  of  to  the  left,  in 
which  case  the  perceptron  might  first  bring  its  "pushing  arm"  into  contact 
with  the  left  side  of  an  object  and  try  pushing  to  the  right,  but  if  it  finds 
that  the  object  remains  stationary,  it  must  reverse  the  position  of  its  arm, 
and  push  to  the  left. 

Such  a  decision  program  still  seems  to  be  within  the  capability 
of  a  perceptron  of  the  type  just  described.  It  must  recognize  (through  its 
visual  inputs)  the  conditions  "no  object  present", "object  present  to  right  of 
arm  location",  "object  present  to  left  of  arm  location",  arm  in  contact  with 
left  side  but  object  stationary",  "arm  in  contact  with  left  side  and  object 
moving",  etc.  The  recognition  of  the  contact  conditions  might  be  facilitated 
by  the  inclusion  of  pressure  transducers  on  the  arm,  providing  an  auxiliary 
sensory  input  to  the  association  system.  An  appropriate  response  sequence 
must  then  be  associated  to  each  of  these  conditions.  For  example,  if  the 
condition  "arm  in  contact  with  left  side  but  object  stationary"  is  recognized, 
the  response  sequence  might  be 

1  .  Retract  arm 

2.  Shift  arm  position  to  right 

i 

3  .  Extend  arm 

This  would  then  yield  the  condition  "object  present  to  left  of  arm  location", 
for  which  the  response  would  be 

I  .  Shift  arm  to  left  until  it  contacts  object 

2.  Apply  pressure 
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The  conditions  of  "moving''  and  "stationary"  objects  can,  of  course,  be 
recognized  by  a  perceptron  with  time  delays  from  the  retina  to  the 
units,  so  that  there  is  nothing  in  the  above  description  which  cannot  be  done, 
in  principle,  by  perceptrons  which  have  already  been  analyzed. 

22.3  Programs  Requiring  Data  Storage 

In  all  of  the  sequential  programs  considered  above,  the  next 
step  has  been  determined  entirely  by  the  conditions  at  the  previous  step, 
and  a  knowledge  of  how  many  steps  have  already  occurred  in  the  current 
sequence.  More  elaborate  programs  require  a  conditional  response  based  on 
information  which  was  available  several  steps  previously,  but  is  no  longer 
present  in  the  sensory  input.  The-  perceptrons  considered  so  far  can  solve 
such  problems  only  by  anticipating  all  possible  sequences  of  conditions, 
and  learning  a  unique  response  sequence  for  each  special  case.  This  rapidly 
becomes  impractical,  as  the  sequences  become  more  involved.  An  example 
of  such  a  problem  is  counting.  In  counting  from  zero  upwards,  we  first 
produce  a  sequence  of  single  digits,  from  one  through  nine;  we  then  add  a 
second  digit  (a  one)  and  reset  the  low  order  digit  to  zero.  The  one  in  second 
place  is  held  fixed,  while  the  low  order  digits  are  recycled,  and  is  then 
changed  to  two,  and  so  forth.  At  an  advanced  stage  in  this  procedure,  we 
may  be  holding  three  or  four  high-order  digits  "in  memory"  while  modifying 
the  low -order  digits  To  perform  such  a  program  expeditiously,  an  internal 
storage  mechanism  is  required,  which  can  be  set  to  hold  a  given  item  of 
information  and  read  out  or  altered  whenever  required.  Such  a  memory 
mechanism  is  much  more  like  a  conventional  digital  computer  memory  than 
anything  yet  encountered  in  perceptron  theory. 
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While  it  is  fairly  easy  to  contrive  systems  which  employ  rigidly  de¬ 
termined  gating  mechanisms  and  more-or-less  conventional  computer  memory 
logic  to  provide  a  temporary  storage  device  for  a  perceptron,  no  realy  satis¬ 
fying  solution  has  been  found  to  date.  A  biological  system  undoubtedly  employs 
something  more  subtle  than  a  coded  address  system  which  transmits  its 
stored  information  on  command,  but  the  similarity  in  logical  requirements 
nonetheless  suggests  that  there  might  be  a  similarity  in  structure  at  this 
particular  point.  It  should  be  remembered,  however,  that  human  ability  to 
perform  complex  algorithrns  without  extensive  practice  and  learning  time 
does  not  begin  to  approach  that  of  a  digital  computer.  The  human  computer 
also  tends  to  rely  heavily  on  such  external  aids  as  pencil  and  paper  to  augm.ent 
his  memory  for  relevant  data,  and  with  the  aid  of  an  external  transcription  of 
its  outputs,  a  perceptron  can  also  be  made  to  perform  rather  elaborate  logic 
(in  the  manner  of  section  Z2.2). 

Some  possible  cues  as  to  the  nature  of  temporary  data  storage  in 
the  human  brain  come  from  introspective  observations  of  recall  of  strings  of 
digits,  words,  or  melodies,  and  such  exercises  as  attempting  to  count  in 
binary  up  to  the  point  where  one  loses  track  of  the  number  on  which  one  is 
operating.  In  all  of  these  cases,  recall  is  helped  by  rhythmic  grouping  of 
elements,  and  by  visualization  or  auditory  imagery  of  the  elements  in  a 
continuously  recurrent  sequence.  It  seems  likely  that  an  active  memory, 
such  as  a  reverberating  loop  system,  which  continuously  rewrites  itself 
on  every  rehearsal  of  the  stored  information,  is  involved. 
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22.4 


Attention -Scanning  and  Perception  of  Complex  Objects 


The  preceding  sections  have  dealt  with  the  phenonnena  of 
program  learning  with  respect  to  response  sequences,  A  capability  for 
program  learning  is  also  useful  for  the  direction  of  attention  over  a  sensoi’y 
field,  and  the  perception  of  a  complex  pattern  or  object  by  noting  its  parts 
and  the  relations  between  them.  The  possibility  of  directing  attention 
selectively  to  part  of  the  visual  field  was  already  observed  in  the  last 
chapter.  A  program-controlled  perceptron  c?ould,  therefore,  be  taught  to 
direct  its  attention  successively  to  different  parts  of  the  field  in  some  syste¬ 
matic  order,  e.g,  ,  to  scan  from  left  to  right,  or  top  to  bottom.  It  is  also 
plausible  (although  it  remains  to  be  demonstrated)  that  a  back-coupled 
perceptron  can  be  taught  to  shift  its  field  of  attention  along  a  contour,  or 
edge  of  a  figure,  so  that  tlie  association  set,  at  any  one  time,  responds 
only  to  part  of  the  contour.  Such  a  system,  by  starting  at  one  point  on  a  curve 
and  following  it  in  one  direction,  could  determine  whether  the  curve  is  closed 
or  open  by  indicating  whether  the  scan  process  returns  to  its  starting  point 
without  having  lost  the  contour  at  any  time. 

In  the  recognition  of  a  complex  structured  object,  such  as  a 
man  (regardless  of  posture,  angle  of  view,  etc.)  a  program  of  observations 
might  note  significant  parts  and  the  transitions  between  them.  There  should, 
for  example,  be  a  head  joined  to  the  shoulders,  and  by  following  a  path  from 
one  of  the  hands,  the  system  should  successively  come  to  a  forearm,  shoulder, 
and  torso.  The  reader  may  recognize  a  similarity  between  this  suggestion 
and  Hebb's  concept  of  a  "phase  sequence"  (Ref.  33).  The  phase  sequence 
consists  of  a  progression  of  cell -assemblies ,  each  of  which  represents  some 
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elementary  perceptual  fragment,  the  entire  sequence  representing  a 
perception  of  a  complex  stimulus  or  experience.  In  the  perceptron,  however, 
the  progression  of  states  is  assumed  to  be  under  the  control  of  a  learned 
program,  which  directs  the  attention  of  the  system  in  such  a  way  as  to  make 
first  one  set  of  A-units,  then  another  set  achieve  dominance,  by  the 
mechanisms  described  in  Chapter  21,  A  sequence -recognizing  system,  such 
as  the  five -layer  perceptron  shown  in  Figure  64,  would  be  required  for  the 
direction  of  the  scanning  process  and  for  the  recognition  of  the  total  configu- 
ration  from  its  parts  .  This  system  employs  an  A  '  layer  of  the  same  type 
as  in  Figure  63  (cross -coupled,  with  fixed  interconnections,  and  A-units 

which  hold  their  state  until  triggered  by  a  sufficiently  strong  signal  to  change). 

(2) 

The  A  S6t  in  this  model,  however,  has  variable -valued  connections 
both  to  a  new  R  set,  which  can  learn  to  recognize  complex  patterns  from 
sequences  of  parts,  and  also  back  to  the  units,  so  that  the  system  can 

be  taught  to  direct  its  attention  in  a  systematic  manner  to  look  for  anticipated 
parts  of  the  complex. 


Figure  64  FIVE-LAYER  PERCEPTRON  FOR  RECOGNITION  OF  COMPLEX  PATTERNS  BY 
ATTENTION  SCANNING  PROGRAMS.  (BROKEN  ARROWS  INDICATE 
VARIABLE  CONNECTIONS). 
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22,5  Recognition  of  Abstract  Relations 


It  is  apparent  that  the  perceptrons  proposed  above  are  already- 
stretching  the  limits  of  what  has  been  firmly  established  analytically  and 
experimentally.  While  there  is  good  reason  to  think  that  the  proposed 
systems  would  work  in  principle,  they  are  highly  speculative,  and  we  are 
far  from  being  able  to  describe  their  performance  in  quantitative  terms. 
Nonetheless,  one  further  venture  in  extrapolation  seems  to  be  of  interest: 

As  was  previously  noted,  the  recognition  of  abstract  topological  relations 
(or  metric  relations,  for  that  matter)  cannot  be  performed  economically 
by  a  perceptron  which  is  required  to  grasp  the  relation  instantaneously  from 
a  complex  pattern.  The  relation  "A  is  inside  of  B",  for  example,  would 
require  that  the  system  be  trained  with  all  possible  cases  of  "A  inside  B” 
and  ''A  outside  B",  even  after  it  has  been  taught  to  identify  patterns  "A" 
and  "B"  correctly.  It  seems  more  likely  that  a  program-controlled  perceptron, 
having  been  taught  to  recognize  patterns  A  and  B,  can  determine  whether  A 
is  inside  of  B  by  means  of  a  directed  scanning  process. 

Suppose  we  show  the  perceptron  a  complex  field,  containing  a 
circle  and  a  square,  both  of  which  it  has  previously  been  taught  to  identify, 
and  we  ask  the  system  to  indicate  whether  the  circle  is  inside  or  outside 
the  square.  This  question  could  be  answered  by  means  of  two  attention 
sweeps,  beginning  at  the  circle  and  first  sweeping  to  the  right,  then  returning 
to  the  circle  and  sweeping  to  the  left.  If  an  edge  of  the  square  is  encountered 
on  one  of  the  two  Sweeps  but  not  on  the  other,  then  the  circle  is  "outside" 
the  square;  if  an  edge  is  encountered  both  to  the  right  of  the  circle  and  to  the 
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left,  the  circle  is  "inside”  the  square,  A  somewhat  more  elaborate 
program  would  determine  whether  a  known  figure  (e.g.,  a  square  or 
triangle)  is  inside  or  outside  of  an  arbitrary  closed  curve. 

In  the  recognition  of  topological  relations  or  metric  relations 
(A  is  larger  than  B,  or  A  is  above  B),  and  in  programs  which  call  for 
attention  scanning,  it  would  probably  help  considerably  to  introduce  geometric 
constraints  into  the  S-A  and  A-A  connections  of  the  perceptron.  In  the  models 
which  have  been  of  primary  interest  up  to  this  point,  there  is  no  way  of  telling, 
apart  from  learned  association,  that  activity  of  a  particular  A-unit  refers  to  a 
particular  region  of  the  sensory  field.  The  A-unit  space  is  non-topological  in 
character;  it  has  no  well-defined  geometry  or  dimensionality.  This  means 
that,  apart  from  learning,  there  is  no  way  of  telling  from  observations  on 
the  state  of  the  A-units,  what  are  the  topological  or  geometrical  properties 
of  the  stimulus  which  is  present  on  the  retina.  While  it  seems  likely  that  a 
geometrically  constrained  organization  of  A-unit  connections  (e.g,,  increas¬ 
ing  the  probability  of  interconnection  between  A-units  whose  retinal  fields 
lie  in  close  pro.ximity  to  one  another)  would  be  helpful,  there  is  still  no 
indication  of  what  are  the  best  constraints,  or  what  gain  in  performance 
can  actually  be  realized  by  such  means. 
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Z3.  SENSORY  ANALYZING  MECHANISMS 


The  term  "sensory  analyzing  mechanism"  will  be  used  for  any 
signal  transmission  unit  or  network  which  detects  and  transmits  information 
about  selected  parts  or  features  of  a  total  stimulus  pattern.  Such  mechanisms 
can  frequently  be  used  to  reduce  the  amount  of  information  which  the  perceptron 
must  be  prepared  to  evaluate.  They  are  particularly  useful  in  highly  organized 
environments  (such  as  the  familiar  visual  environment,  or  an  environment  of 
printed  words  or  spoken  language)  where  purely  random  stimuli  are  unlikely 
to  occur  or  are  of  little  interest.  Thus  a  mechanism  which  detects  boundaries 
of  a  solid  image  or  describes  gradients  and  contrasts  in  the  visual  field,  or 
performs  a  Fourier  analysis  of  an  audio  input,  or  which  encodes  speech  into  a 
sequence  of  phonemes,  would  be  considered  a  sensory  analyzing  mechanism. 

A  simple  sensory  unit  which  detects  the  level  of  illumination  at  a  given  point, 
or  an  A-unit  which  samples  the  illumination  over  a  selected  set  of  points  are 
also  sensory  analyzing  mechanisms. 

In  most  models  considered  thus  far,  little  attempt  has  been  made 
to  optimize  the  sensory  analyzing  mechanisms  employed.  The  random  origin 
configurations  which  have  generally  been  employed  can  be  shown  to  be  far 
from  optimum.  In  this  chapter,  various  methods  of  improving  this  primitive 
organization  will  be  considered,  particularly  with  respect  to  visual  and 
auditory  systems.  For  the  most  part,  these  mechanisms  are  assumed  to  take 
the  form  of  built-in  constraints,  such  as  were  considered  briefly  in  the  d.i.d, 
models  of  Section  7.2.Z,  and  the  similarity-constrained  perceptrons  of 
Section  15.3.  The  existence  of  such  mechanisms  in  biological  organisms 
is  supported  by  an  increasing  amount  of  evidence,  sucli  as  Lettvin's  studies  of 
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frog  vision  (Ref,  51),  Sutherland's  studies  of  octopus  vision  (Ref.  98), 

Gibson  and  Walk  on  depth  perception  (Ref.  24),  Sauer's  work  on  bird  navigation 
(Ref.  90),  and  Hubei's  work  on  cat  vision  (Ref.  113).  Since  most  of  these 
mechanisms  appear  to  be  hereditary  rather  than  learned,  it  seems  likely 
that  they  may  be  realized  either  by  simple  spatial  constraints  in  the  distri¬ 
butions  of  connections  in  the  sensory  network,  or  else  by  simple  "typological 
constraints"  governing  the  kinds  of  cells  which  may  be  interconnected. 

23.1  Visual  Analyzing  Mechanisms 


A  number  of  basic  strategies  for  processing  visual  information 
have  been  proposed.  Some  of  these  are  so  closely  tied  to  digital  computer 
processes  that  they  are  of  little  interest  for  a  biological  model,  while  others 
require  such  a  degree  of  logical  precision  and  so  large  a  system  as  to  be 
biologically  implausible  (e.g. ,  Refs,  16,  17,  71),  The  techniques  to  be  consi¬ 
dered  here  are  grouped  under  four  main  headings:  (1)  Local  property  detectors 
(2)  Hierarchical  retinal  field  organizations;  (3)  Sequential  programs  (centering 
and  scanning  methods);  and  (4)  Sampling  of  sensory  parameters.  The  possible 
advantages  of  each  of  these  methods  will  be  considered  (largely  in  a  quali¬ 
tative  fashion),  and  the  problem  of  an  optimum  mixture  of  analyzing 
mechanisms  (somewhat  analogous  to  the  "mixed  strategy"  problem  in  game 
theory)  will  be  discussed. 

23.1.1  Local  Property  Detectors 


The  term  "local  property  detector"  will  be  used  for  any 
mechanism  or  neuron  which  responds  to  some  particular  feature  of  the 
stimulus  pattern  at  a  particular  location  (for  example,  brightness,  color. 


-512- 


contour  direction,  etc.).  Contour  detectors  and  other  types  of  property 
detectors  have  been  described  by  Culbertson  (Ref,,  17),  Taylor  (Ref.  99), 
Inselberg,  Lbfgren,  and  von  Foerster  (Ref.  4),  and  others.  Lettvin  and 
associates  (Ref.  51)  have  described  four  mechanisms  (for  detection  of 
contrast,  convexity,  or  small  spot  detection,  moving  edge  detection,  and 
dimming  detection)  which  appear  to  map  into  four  distinct  layers  of  the  frog's 
tectum.  Of  particular  interest  for  present  purposes  is  the  series  of  experi¬ 
ments  described  by  Hubei  (Ref.  113),  in  which  the  cells  of  a  cat's  visual 
cortex  are  shown  to  respond  to  lines  and  bars  in  particular  positions  and 
orientations,  or  to  stimuli  moving  in  particular  directions. 

The  visual  property  detectors  which  appear  on  an  a  priori  basis 
to  be  of  maximum  value  for  pattern  recognition  in  an  ordinary  terrestrial 
environment  (where  the  main  purpose  of  the  system  is  to  detect  and  recognize 
coherent  physical  objects  )include  the  following: 

1)  Brightness  and  color  detection  and  measurement 

2)  Contour  and  gradient  detection 

3)  Curvilinearity  detection  and  measurement 

4)  Detection  of  angles,  intersections,  and  discontinuities 
of  lines  and  boundaries 

5 )  Spot  detection 

6)  Sensing  of  textures,and  measurement  of  texture 

gradients 

7)  Velocity  and  accelleration  detection  and  measurement 
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In  order  to  recognize  stimulus  patterns  or  objects,  information 
of  the  types  listed  above  must  somehow  be  combined  for  different  parts  of 
the  retina,  to  provide  an  indication  of  the  total  configuration.  This  has  been 
the  main  job  of  the  association  units,  in  the  perceptrons  considered  thus  far. 

In  all  cases  considered  in  previous  chapters,  the  A-units  have  formed 
combinatorial  functions  of  information  coming  from  "local  intensity  detectors" 
(the  S -units);  thus  the  only  property  detectors  employed  have  been  of  the  first 
type.  The  perceptron  illustrated  in  Figure  65  introduces  an  additional  layer 
of  A-units  immediately  following  the  S-units,  which  can  detect  additional 
properties  of  the  types  indicated  above.  The  layer,  having  its  origin  points 

in  the  A  ^  layer,  now  responds  to  combinations  of  local  properties  such 
as  lines  and  gradients,  rather  than  merely  to  points  of  light. 


Figure  65  ORGANIZATION  OF  A  PERCEPTRON  EMPLOYING  LOCAL  PROPERTY  DETECTORS. 
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The  organization  of  origin  fields  for  A-units  serving  as  property 
detectors  of  various  types  is  illustrated  in  Figure  66.  The  single-connection 
"point  detector"  serves  merely  as  a  logical  relay  for  information  which 
could  be  obtained  equally  well  directly  from  the  retina.  The  concentric 
field  organization  of  the  spot  detector  appears  to  be  found  (in  the  case  of 
the  cat)  more  characteristically  in  the  retinal  ganglion  cells  than  in  the 
visual  cortex  (Ref.  113).  The  various  forms  of  line  detectors  and  the 
"Type  2"  termination  detector  have  all  been  observed  in  the  cat's  cortex 
by  Hubei.  Hubei  has  also  reported  units  which  respond  only  to  moving 
stimuli,  although  the  organization  appears  to  be  different  from  that  suggested 
in  Fig.  66(a),  for  the  "moving  edge  detector".  There  is  some  evidence  that 
the  movement  detectors  in  the  cat  rely  more  upon  the  simultaneous  summation 
of  "off"  signals  from  uncovered  retinal  points  and  "on"  signals  from  retinal 
points  which  ha^'e  just  been  covered  by  the  displaced  stimulus. 

The  use  of  the  Type  2  termination  detectors  is  illustrated  in 
Fig.  66(b),  An  unit  which  receives  connections  both  from  a  termination 

detector  and  a  line  detector  crossing  the  same  field  can  recognize  that  the 
line  approaches  the  inhibitory  spot  of  the  termination  detector,  but  does  not 
cross  it.  The  same  termination  detector,  taken  in  conjunction  with  lines  at 
different  angles,  can  serve  to  indicate  termination  of  any  one  of  the  lines,  so 
that  there  is  considerable  saving  by  this  method.  In  fact,  if  there  are 
discriminable  angles  for  straight  lines,  and  r  discriminable  translates  of 
each  line,  (so  that  there  are  about  distinguishable  termination -points 
scattered  over  the  retina)  then  a  system  which  employs  Type  1  termination 
detectors  would  require  a  total  of  units  to  guarantee  a  detector 
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{i)  organization  of  sensory  fields  of  a  UNITS.  BROKEN  LINES  INDICATE  FIELDS 
OF  INHIBITORY  ORIGIN  POINTS:  SOLID  LINES  INDICATE  EXCITATORY  FIELDS. 


RETINA 


A-UNITS 


POINT  DETECTOR 
SPOT  DETECTOR 

LINE  DETECTOR  (LIGHT  ON  DARK  GROUND) 

LINE  DETECTOR  (DARK  ON  LIGHT  GROUND) 

TERMINATION  OR  CORNER  DETECTOR  (TYPE  2) 
BOUNDARY  OR  GRADIENT  DETECTOR 

TERMINATED  LINE  DETECTOR  (TYPE  I) 

CORNER  DETECTOR  (TYPE  I  ) 

MOVING  EDGE  DETECTOR 


OBSERVED  IN 
CAT  CORTEX 


(b)  TYPICAL  A  (2)  COMBINATIONS.  POSITION  OF  RETINAL  FIELDS  OF  A  ''' '  UNITS  IS  SHOWN 
RELATIVE  TO  FIXED  AXES,  FOR  EACH  UNIT. 


(I) 


(2) 


UNITS 


-X  r" 


e  =  2 

to  RESPONDS  TO  HORIZONTAL  LINE. 
TERMINATED  AT  RIGHT  END 

e  -  2 

RESPONDS  TO  HORIZONTAL  LINE, 
MOVING  DOWNWARDS 


Figure  66  ORIGIN  FIELD  ORGANIZATIONS  FOR  LOCAL  PROPERTY  DETECTORS 
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foi"  each  combination  of  angle  and  termination  point.  The  use  of  Type  2 

detectors  in  conjunction  with  line  detectors  (as  in  Fig.  66(b))would  require 

only  r'  -h  r  A  units,  to  convey  the  same  information.  If  r  and  A 

are  both  equal  to  100,  this  means  that  10^  units  are  required  with 

4 

Type  1  units,  and  2  x  10  with  Type  2  units.  This  may  indicate  why  the 
Type  2  configuration  appears  to  be  found  in  the  cat,  rather  than  the  Type  1 
configurations . 

Figure  66(b)  also  demonstrates  the  multiple  use  of  the  same 

elementary  property  detectors  (  A  units)  for  a  number  of  more  complex 

functions  at  the  A  level.  Thus,  the  unit  o. ^  is  employed  both  in  a 

terminated  line  detector  and  also  as  part  of  a  moving  line  detector.  Since 

(2) 

movement  detection  can  thus  be  obtained  quite  economically  at  the  A 
level,  the  type  of  moving  edge  detector  illustrated  in  Figure  66(a)  would 
tend  to  be  obviated.  Hubei’s  observations  on  the  cat  suggest  that  (although 
more  complex  organizations  may  remain  to  be  discovered)  the  most  promi¬ 
nent  types  of  property  detectors  in  the  visual  cortex  are  of  very  simple  types, 
such  as  the  line  and  boundary  detectors  and  Type  2  termination  detectors 
illustrated  in  Figure  66(a).  In  all  of  these  cases,  a  single  excitatory  and 
inhibitory  field,  with  simple  constraints  on  the  density  of  connections  of 
each  type,  is  sufficient  to  yield  the  mechanism  indicated. 

The  actual  advantages  which  might  be  realized  by  means  of 
various  types  of  property  detectors  have  been  investigated  for  several 
simple  discrimination  problems,  with  the  results  shown  in  Table  10.  Two 
types  of  environments  were  considered:  the  first  consists  of  the  letter  "T" 
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in  right -side -up  and  upside-down  orientations,  and  the  second  consists  of 
the  letter  "L",  also  right-side -up  and  upside-down.  Each  letter  can  appear 
in  all  translational  positions.  The  problem  of  discriminating  the  right-side - 
up  "T''  from  the  upside-down  "T''  is  considered  for  a  variety  of  retinal  sizes 
ranging  from  20  x  20  to  1000  x  1000.  The  retina  is  assumed  to  be  torroidally 
connected  in  all  cases.  With  both  the  T  and  the  L  ,  the  horizontal  line  is 
taken  to  be  nine  units  long,  while  the  height  of  the  letter  is  ten  units.  The 
thickness  of  the  lines  is  one  unit,  throughout.  The  perceptron  .  con  Idered 
are  of  the  type  shown  in  Figure  65,  with  the  assumption  that  all  inputs  to 

A-units  are  excitatory.  Rather  than  attempting  to  find  optimum  parameters 

{2  ) 

for  the  various  types  of  property  detectors,  the  number  of  A  inputs  is 
always  the  minimum  number  which  will  permit  the  discrimination  to  be 
achieved.  Other  parameters  (and  the  introduction  of  inhibitory  connections) 
would  undoubtedly  permit  more  economical  solutions,  but  this  serves  to 
illustrate  basic  principles. 

':i) 

The  table  gives  the  probabilities  of  finding  >inits  which 

will  discriminate  between  a  given  stimulus  of  the  "positive"  class  (say  the 
upright  position)  and  all  members  of  the  opposite  class.  The  origin  points- 
of  the  A  units  are  assumed  to  be  chosen  at  random  from  among  the 
/  units.  The  first  line  of  the  table,  in  which  the  ^  units  are 
simple  point  detectors,  corresponds  to  the  case  of  a  simple  perceptron, 
where  each  A-unit  receives  its  input  connections  directly  from  the  retina. 

For  such  a  system,  it  can  easily  be  seen  that  at  least  two  excitatory  origins 
and  a  threshold  of  2  are  required  in  order  to  distinguish  between  the 
upright  and  upside-down  "L",  while  three  excitatory  origins  and  a  threshold 
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of  3  are  required  to  distinguish  the  upright  from  the  upside-down  "T". 
The  figures  in  the  first  two  columns  of  the  table  are  influenced  by  small- 
retina  effects,  which  disappear  for  the  40  x  40  and  larger  retinas. 


Several  general  conclusions  can  be  drawn  from  this  table. 

First  of  all,  it  is  clear  that  the  value  of  different  types  of  property  detectors 
depends  upon  the  stimuli  to  be  discriminated  as  well  as  the  size  of  the  retina. 
For  the  discrimination  of  the  L-shaped  stimuli,  which  require  only  two  points 
or  blobs  for  discrimination,  the  best  results  are  obtained  with  large  (4  x  4) 
square  origin  point  configurations  for  the  A''^^  units,  while  for  the  T's 
a  slightly  elongated  (4x5)  configurations  with  a  high  threshold  is  preferable, 
since  it  permits  the  use  of  only  two  units  instead  of  three  per 

unit.  Note  that  the  advantage  of  the  rectangular  origin  configuration  over 
the  4x4  square  is  pronounced  only  for  large  retinal  sizes,  however;  for  a 
smaller  retina  than  20  x  20,  the  square  configuration  might  actually  be 

preferable.  For  the  conditions  considered  in  this  analysis,  the  following 

(2, 

equation  for  the  probability  of  a  useful  A  unit  shows  the  effect  of 
increasing  retinal  size: 


P 


rn 


(23.1) 


The  reader  may  find  it  instructive  to  examine  the  Q-matrices  for  a 
binomial  perceptron  in  these  problems,  and  satisfy  himself  that  they 
are  consistent  with  the  geometrical  requirement  that  three  inputs  and 
a  threshold  of  3  are  required  to  discriminate  between  the  upright  and 
upside  down  "T",  in  all  translational  positions. 
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where 


/V. 


m 


r 


n 


-  number  of  S -points  in  retina 

=  number  of  useful  combinations  of  origin  configurations 

for  an  unit 

=  number  of  admissible  rotational  positions  for  each 
configuration 

{2  ) 

=  number  of  input  connections  to  each  A  unit 


For  a  large  retina,  P  clearly  becomes  small  very  rapidly,  and  the  situation 

(2) 

is  aggravated  by  the  requirement  of  many  inpius  for  each  A  unit.  Thus 

for  the  discrimination  of  the  upright  and  upside-down  T  >  which  requires 

-4  - 1 6 

three  point  inputs,  P  goes  from  10  for  a  20  x  20  retina  to  about  10 

for  a  1000  x  1000  retina.  The  use  of  4  x  5  bars  as  line  detectors  instead 

of  point  configurations,  while  it  improves  the  probability  by  more  than  three 

orders  of  magnitude,  still  leaves  a  requirement  for  over  10^^  a''^^  units 

if  the  T  is  to  be  discriminated  reliably  in  the  large  retina.  Even  with 

optimum  parameters,  the  required  number  of  A^'’^  units  is  inadmissibly 

large.  Nonetheless,  the  recognition  of  the  position  of  a  9x10  "  T"  in  a 

1000  X  1000  field  is  certainly  well  within  the  limits  of  human  vision.  Some 

additional  means  must  therefore  be  found,  to  provide  an  economical  solution 

for  this  problem  without  introducing  a  brainful  of  special  "T -detectors" . 

The  principles  discussed  in  the  following  section,  combined  with  the  use  of 

property  detectors,  will  be  seen  to  yield  a  radical  improvement  in  the 

recognition  of  small  stimuli. 
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23.1,2  Heirarchical  Retinal  Field  Organizations 


The  "retinal  field"  of  an  A-unit  is  the  region  of  the  retina  in 
which  its  origin  points  may  be  found.  In  a  multi-layer  system,  the  retinal 
field  of  an  unit  is  the  union  of  the  retinal  fields  of  the  units 

M  \ 

which  are  connected  to  the  A  unit;  in  general,  the  retinal  field  of  an 

(■k)  /■k-O 

A  unit  is  the  >inion  of  the  retinal  fields  of  the  connected  A  units.  In 

a  perceptron  with  a  heirarchical  retinal  field  organization,  the  retinal  fields 
of  the  A-units  tend  to  increase  in  area,  the  greater  the  logical  distance  of  the 
A-unit  from  the  retina.  For  example,  the  units  may  have  highly  local¬ 

ized  origin  configurations  for  the  detection  of  local  properties  (as  in  Table  10); 

(Z) 

the  A  units  could  then  detect  combinations  of  properties  over  a  somewhat 
larger  field  (responding  to  small,  compact  figures  or  parts  of  larger  patterns); 
and  a  layer  of  units  might  then  be  added  to  respond  to  combinations  of 

sub-figures  over  the  entire  retina.  While  the  general  principle  of  organization 
is  from  small  to  large  retinal  fields  as  the  A-units  increase  in  depth,  it  is  not 
required  that  all  .A. -units  at  a  given  level  have  retinal  fields  of  the  same  size; 
there  may  be  A^'^  units,  for  example,  whose  fields  are  larger  than  the 
smallest  fields,  provided  the  expected  size  of  the  retinal  fields 

increases  with  increasing  depth.  ' 

Such  a  system  is  clearly  much  closer  to  the  organization  of 
the  mammalian  visual  system  than  the  uniform  origin  distributions  which 
were  considered  in  previous  models.  A  brief  consideration  was  given  to 
constrained  origin  fields  in  Section  7,2,2,  where  it  was  found  that  no 
appreciable  gain  in  performance  was  obtained  with  large  stimuli,  such  as 
the  squares  and  triangles  of  Experiment  7.  The  effects  of  employing  cons- 


-522- 


/2  ) 

trained  retinal  fields  for  the  A  units  in  the  perceptron  of  Figure  65 
will  now  be  considered,  for  the  range  of  retinal  sizes  shown  in  Table  10. 

It  was  found  in  the  preceding  section  that  as  the  retina  becomes  large 
relative  to  the  size  of  the  stimuli,  the  probability  of  finding  a  useful 
unit  becomes  inadmissibly  small  in  the  unconstrained  system.  Table  11 
shows  the  effect  of  limiting  A  retinal  fields  to  a  20  x  20  region  of  the 
retina  (located  at  random  in  a  larger  retina).  Again,  it  should  be  remembered 
that  the  parameters  have  not  been  optimized,  and  that  appreciably  better 
results  might  be  obtained  with  larger  numbers  of  inputs  to  the  A  units, 
and  the  inclusion  of  inhibitory  connections.  Nonetheless,  a  comparison 
with  Table  10  illustrates  the  marked  improvement  in  the  size  of  the  system 
necessary  to  achieve  recognition  in  a  large  retina.  The  first  column  of 
probabilities  (for  the  20  x  20  retina)  is,  of  course,  identical  to  the  correspon¬ 
ding  column  of  Table  10,  and  the  first  line  corresponds  to  a  three-layer 
model  with  constrained  origin  fields  for  the  A-units.  In  the  case  of  the 
1000  ,x  1000  retina,  using  the  best  of  the  origin  configurations,  a 

gain  of  more  than  five  orders  of  magnitude  is  obtained,  bringing  the  discri¬ 
mination  problem  for  the  first  time  within  the  capacity  of  a  human-sized 
brain  model.  Note,  however,  that  the  best  origin  configuration  has 

shifted  from  the  4x5  bar  with  9  =  5  to  the  4x4  square  with  0  =  1. 


The  probability  P'  of  finding  a  useful  unit  in  this  system 

is  given  by  the  following  equation,  which  is  analogous  to  (23.  1): 


n 


A//  _  ^ 


(23.2) 
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where  ?ri  ^  r  ^  and  ^  are  defined  as  for  equation  23.1,  =  number  of 

S -points  in  the  retina,  and  =  number  of  S -points  in  the  retinal  field 

/  n  \ 

of  an  A  unit.  Taking  the  ratio  of  equations  (23.2)  and  (23.1),  we  obtain 
the  relative  advantage  of  the  constrained  retinal  field  system  over  the 
unconstrained  system: 


(23,3) 


Thus  the  advantage  increases  exponentially  with  the  number  of  connections 
required  to  each  A  unit,  and  with  the  ratio  N^/n'  •  Both  of  these 

effects  can  be  seen  in  Table  11, 


Clearly,  if  the  system  is  required  to  recognize  a  stimulus  of 

diameter  D,  the  size  of  the  retinal  field  cannot  be  taken  smaller  than  D,.. 

without  loss  of  performance;  the  above  equations  assume  that  the  retinal 

field  is  large  enough  so  that  boundary  effects  can  be  neglected.  The  optimum 

size,  then  appears  to  be  on  the  order  of  D,  the  expected  stimulus  diameter. 

We  now  have  the  problem  of  how  to  deal  with  universes  of  stimuli  which  vary 

in  diameter  from  very  small  to  very  large  patterns.  The  best  choice  of  a 

distribution  of  retinal  field  sizes  for  the  A  units  will  generally  be  one 

which  guarantees  the  same  likelihood  of  finding  a  useful  A  unit  for  all 

stimuli.  For  the  particular  case  in  which  the  stimulus  diameter  distribution 

is  uniform  between  the  limits  D  and  D  ,  this  can  be  approximately 

min  max 

realized  by  taking 


Prob  (A  =  o')  =  i/d'  L 


(23.4) 
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Table  11  suggests  that  stimuli  of  the  complexity  of  alphabetic 
characters  ranging  in  size  from  .01  to  1  retinal  diameter  can  be  recognized 
by  a  system  the  size  of  the  human  brain  (  10  units)  by  employing  a  four- 

layer  model,  with  a  suitable  combination  of  property  detector  configurations 

'2) 

and  a  suitable  distribution  of  field  diameters.  The  recognition  problem 

can  be  made  considerably  m^ore  difficult,  however,  by  adding  additional  degrees 
of  freedom  to  the  stimulus  organizations.  Consider,  for  example,  the  following 
environment;  Let  /V  consist  of  two  classes  of  composite  stimuli.  Each 
stimulus  consists  of  two  9x10  T's  ,  which  may  be  located  at  any  position 
in  the  retinal  field,  provided  they  are  at  least  10  retinal  units  apart.  If 
both  T's  are  right -side -up  or  if  both  are  upside-down,  the  stimulus,  is  a 
member  of  the  positive  class;  if  one  is  right-side -up  and  the  other  is  upside- 

down,  the  stimulus  is  in  the  negative  class.  Let  us  consider  the  probability 

(2> 

of  finding  a  useful  A  '  unit  for  this  dichotomy. 


If  these  stimuli  are  to  be  differentiated  by  A-units  with  random- 
point  origin  configurations  (all  excitatory,  as  in  the  previous  examples)  then 
six  connections  and  a  9  of  6  is  required  for  each  unit.  By  employing 

one  of  the  line -detector  mechanisms  of  Table  10,  4  inputs  and  a  0  of  4  are 

required.  The  constrained-field  system  of  Table  11  (with  20  x  20  retinal  fields 

(2  J 

for  the  A  '  units)  cannot  be  employed  here,  as  the  combined  stimulus 
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pattern  may  cover  the  entire  retinal  field.  The  best  that  can  be  done  is  to 

employ  the  a!‘'  configuration  of  4  x  5  bars,  which  yields  a  probability  of 

“25  ^21 

6x10  ”  of  finding  a  useful  A"  unit,  with  a  1000  x  1000  retina.  (For 

the  single  random  point  configuration  --  the  worst  case  --  the  probability 
is  7.34  X  lO'^^.  ) 


By  employing  a  five -layer  topology,  it  is  possible  to  take 
advantage  of  the  fact  that  each  stimulus  actually  consists  of  two  organized 
sub-patterns,  each  having  quite  small  dimensions  relative  to  the  retina. 

Assume  the  A  units  to  have  20  x  20  retinal  fields,  as  in  Table  11,  while 
the  units  have  two  excitatory  input  connections,  chosen  at  random 

from  among  the  units.  Thus  the  a'*^  units  serve  as  local  property 

detectors,  the  units  serve  as  sub-pattern  detectors,  and  the 

units  integrate  this  information  over  the  whole  retinal  field.  (In  this 
particular  problem,  the  performance  could  be  improved  further  by  taking  a 
larger  number  of  input  connections  for  each  unit,  but  as  before,  we  are 

trying  to  demonstrate  basic  principles  rather  than  find  optimum  organizations.) 
This  five-layer  system  is  compared  with  the  four-layer  system  in  Table  12. 

For  moderate  numbers  of  connections  to  the  A'^'  units  in  this  system,  the 
probability  of  a  useful  A^'*^  unit  (with  9  =  Z  )  can  be  closely  approximated  by 
the  binomial  probability; 


P'Ul-P') 


1C  -z 


(23.5) 


where 


probability  of  a  useful  A  unit  for  "sub-figure" 
discrimination,  and 


'■51  (3) 

/jA  -  number  of  (e.xc itatory )  input  connections  to  an  A 


unit 


OR  VERT.  WITH  EQUAL  PROB. 


Thus  with  25  inputs  to  each  A"  unit  the  probabilities  for  the  five-layer 
systems  could  be  increased  by  a  factor  of  about  300.  Note  that  even  under 
these  conditions,  however,  while  the  problem  becomes  soluble  for  a  brain¬ 
sized  system  in  the  case  of  a  100  by  100  retina,  it  is  still  unmanageable  in 
the  1000  X  1000  retina. 

The  difficulty  of  this  problem  for  the  large  retina  should  not 

surprise  us;  it  is  unlikely  that  a  human  subject,  asked  to  perform  the 

indicated  discrimination  with  tachistoscopically  presented  stimuli,  could  do 

appreciably  better  than  chance,  where  the  two  T's  each  subtend  only  1/100 

of  the  central  visual  field,  and  are  located  at  random  relative  to  one  another. 

Even  the  case  of  the  100  x  100  retina  (where  the  T's  subtend  1/10  the 

diameter  of  the  field)  would  probably  yield  marginal  results,  if  the  subject 

were  not  permitted  time  to  scan  the  field  or  shift  his  attention  during  the 

exposure.  On  the  other  hand,  if  the  T's  were  constrained  to  lie  relatively 

close  to  one  another  (say  within  a  40  x  40  subfield)  the  problem  would 

probably  not  be  difficult.  This  problem,  however,  could  readily  be  handled 

: ) 

by  a  five-layer  perceptron  in  which  the  •!'  retinal  fields  were  constrained 
to  a  40  X  40  region,  while  limiting  the  A'  fields  to  20  x  20,  as  before. 

Thus  it  appears  that  a  heirarchical  organization  with  three  association  layers 
is  competitive  with  human  visual  performance,  with  respect  to  resolution  of 
detailed  figures  and  recognition  of  complexes  of  sub -figures,  under  condi¬ 
tions  in  which  no  scanning  or  shifting  of  attention  is  allowed. 
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If  we  were  to  complicate  the  problem  by  adding  a  third  "T"  , 


again  placing  the  stimuli  in  the  positive  class  if  all  T's  face  the  same  way, 
and  the  negative  class  if  some  face  up  and  some  down,  the  probabilities  of 
finding  suitable  A  and  A  units  would  again  fall  by  many  orders  of 
magnitude.  For  this  problem,  it  is  unlikely  that  any  purely  spatial  and 
parametric  constraints  on  the  network  would  permit  a  solution  with  only  10^^ 
units,  with  a  retina  appreciably  greater  than  the  size  of  the  stimuli.  It  is 
also  unlikely  that  a  human  subject,  under  tachistoscopic  conditions,  could  do 
much  better.  Thus  for  complex  organizations  of  organized  sub-figures,  each 
of  which  has  several  degrees  of  freedom  independently  of  the  others,  some 
additional  strategy  must  be  sought  to  improve  recognition  capability.  The  use 
of  sequential  observations  seems  to  be  indicated  at  this  point, 

Z3.1.3  Sequential  Observation  Programs 


The  perceptrons  considered  in  the  last  two  sections,  while 
facilitating  the  discrimination  of  small  patterns  in  which  fine  details  provide 
the  essential  information,  are  still  far  from  optimum.  For  one  thing,  the 
number  of  A-units  required  remains  very  large;  for  another  thing,  the 
learning  time  would  be  correspondingly  great,  if  the  discrimination  must  be 
learned  for  all  combinations  of  figural  elements.  These  difficulties  can  be 
drastically  reduced  by  the  employment  of  a  program -learning  perceptron,  such 
as  the  models  considered  in  the  last  chapter.  In  particular,  a  system  of  the 
type  described  in  Section  ZZ.4,  with  a  selective  attention  mechanism  which 
permits  it  to  attend  to  one  detail  or  sub-figure  at  a  time,  is  likely  to  prove 
useful  in  dealing  with  complex  stimuli.  Such  a  system  can  be  employed  in 
at  least  two  basic  ways: 
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1)  It  can  be  taught  to  recognize  the  presence  of  a  sub-pattern 

(a  spot  or  region  of  in  which  the  fine  structure  is  particularly  dense)  without 
having  to  classify  it  or  differentiate  it  precisely.  It  can  then  direct  the  visual 
centering  mechanisms  to  bring  this  pattern  to  the  center  of  the  retina,  where 
high-resolution  is  possible,  and  where  the  system  is  taught  to  differentiate 
the  type  of  pattern  more  precisely. 

2)  The  perceptron  may  be  taught  to  examine  each  of  a  number 
of  retinal  regions  in  turn  (either  by  a  systematic  scanning  procedure,  by 
following  boundaries,  or  by  directing  attention  to  those  sub-fields  in  which 
the  fine  structure  is  particularly  dense).  This  will  result  in  the  recognition 
of  a  definite  sequence  of  details,  which,  in  its  entirety,  serves  to  identify 
the  complex  stimulus  organization. 

The  recognition  of  small  objects  in  a  large  field  may  best  be 
achieved  by  the  first  of  these  methods,  while  the  discrimination  of  complex 
organizations  (e.g.,  individual  faces)  requires  the  second  method.  In 
employing  the  second  method,  it  would  be  particularly  helpful  if  the 
perceptron  could  shift  its  field  of  attention  systematically  in  a  given 
direction,  with  the  direction  of  attention  shift  provided  as  an  additional 
piece  of  information  to  the  association  system  at  all  times.  In  this  case, 
the  general  configuration  of  the  letter  "A"  followed  by  the  letter  "B" 
followed  by  "C"  could  be  recognized  by  starting  from  the  left  of  the  field, 
shifting  attention  right  to  the  first  "detail",  then  right  again  to  the  second 
detail,  and  then  right  again  to  the  third.  The  recognition  of  this  complete 
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sequence  vt'ould  indicate  the  ABC  configuration  regardless  of  the  actual 
positions  of  the  letters  in  the  field  or  their  relative  distances.  It  seems 
likely  that  the  general  problem  of  relation-recognition  will  ultimately  yield 
only  to  sequential  programs  of  this  type.  ^ 

Z3  .  1 . 4  Sampling  of  Sensory  Parameters 


.  A  fourth  basic  strategy  for  simplifying  the  sensory  data  which 

the  perceptron  must  deal  with  is  that  of  independent  sampling  of  sensory 

parameters.  In  a  general  visual  input  system,  five  parameters  are  of 

interest:  the  intensity'of  illumination  at  a  point,  the  frequency  or  color  of 

the  illumination,  the  tim.e  at  which  it  occurs,  and  the  /  and  y  coordinates 

of  the  location  of  the  point  on  the  retinal  surface.  Each  of  these  variables 

may  be  varied  independently  of  the  others.  If  we  required  a  retina  of  1000 

lines  resolution  (i.e.  ,  10^  points),  with  sensitivity  to  10  frequency  bands, 

10  levels  of  illumination,  and  10  time  delays  for  the  outputs  of  each  S-point, 

9 

a  total  of  in  retinal  points  would  be  required  to  provide  a  sensory  unit  for 
each  combination  of  values. 

If  it  is  actually  required  to  discriminate  between  any  two  patterns, 
no  matter  how  minute  the  difference  between  them,  then  there  is  no  way  of 
escaping  this  requirement.  In  general,  however,  we  are  satisfied  with 
appro.ximate  information,  and  it  is  only  under  special  conditions  of  "good 
observation"  that  we  expect  to  obtain  the  highest  resolution  from  the  system. 
We  can  take  advantage  of  this  by  means  of  the  following  organization. 


One  sequential  mechanism  which  may  greatly  improve  performance  is  to  take 
a  sequence  of  "looks"  at  a  given  stimulus,  with  different  fixation  points  selec¬ 
ted  at  random,  accepting  a  majority  decision  for  the  final  response.  The  gains 
which  might  be  expected,  assuming  independence  between  "looks",  have  been 
discussed  in  Reference  79,  pp  156-157. 


To  each 


6 

Suppose  we  limit  the  number  of  retinal  points  to  10 
of  these  S-points,  x  and  y  coordinates  are  assigned  at  random  (from  a  uni¬ 
form  distribution  over  the  whole  field,  rather  than  just  points  on  a  1000  by 
1000  lattice).  In  addition,  a  frequency  drawn  at  random  from  the  sensitivity 
range  of  the  system  is  assigned  to  each  S-point,  and  a  threshold  and  time 
delay  are  similarly  assigned  at  random.  Now,  if  the  perceptron  sees  a 
moving  figure,  with  a  variety  of  shading  and  color  variation,  it  will  be  less 
precise  in  its  judgement  as  to  the  exact  position  of  the  figure  at  time  t  ,  or 

the  color  of  a  given  point  in  the-  retinal  field  at  time  t  ,  than  would  be  the 

9 

case  with  the  "complete"  system  with  10  S-points.  If,  however,  we  "fix" 

the  position  of  the  figure  on  the  retina,  and  provide  maximum  contrast 

between  illuminated  and  non-illuminated  points  (i.e,  sharpen  the  figure  to 

a  black  and  white  silhouette),  and  observe  it  for  long  enough  to  permit  all 

time  delays  to  propagate,  then  we  have  just  as  good  shape -definition  as  in  the 

9  6 

system  with  10  S-points,  since  ail  10  retinal  points  will  contribute  one 
bit  of  information.  Alternatively,  if  the  entire  field  is  illuminated  at  maximum 
intensity  with  a  given  frequency  of  light,  this  frequency  can  be  discriminated 
to  one  part  in  10^  ,  or  five  orders  of  magnitude  better  than  the  previous 
model.  The  same  will  be  true  with  respect  to  intensity  discrimination  if 
the  field  is  illuminated  with  white  light,  all  frequency  components  being 
present  with  the  same  intensity.  Similarly,  the  velocity,  acceleration, 
and  higher  derivatives  of  the  velocity  of  a  moving  object  can  be  discrimi¬ 
nated  much  better  with  the  10^  element  randomized-parameter  system, 
provided  the  moving  image  consists  of  a  sharp  black  and  white  pattern. 

Finally,  we  note  that  if  we  wish  to  specify  the  exact  retinal  coordinates  of 
a  square,  the  edges  of  which  are  alligned  with  the  lattice  pointq  in  the  first 


model,  we  can  expect  a  maximum  accuracy  of  one  part  in  1000,  whereas 
with  the  random  configuration  (where  some  of  the  points  will  fall  virtually 
on  the  boundary  of  the  square  regardless  of  its  location)  we  could  expect 
to  improve  the  performance  by  several  orders  of  magnitude. 

What  is  sacrificed  in  this  system  is  the  ability  to  provide  full 
information  about  individual  retinal  points,  and  the  ability  to  provide  maximum 
precision  of  discrimination  in  the  case  of  shaded,  moving  figures.  It  would 
be  difficult,  for  example,  to  precisely  locate  the  boundary  of  a  moving  cloud, 
or  to  state  the  exact  colors  of  specified  points  in  a  continuously  varying 
mixture  of  colored  lights;  these  are  precisely  the  conditions,  however, 
under  which  a  human  observer  would  also  encounter  difficulty,  whereas  if 
we  optimize  the  conditions  of  observation  by  providing  stationary  figures  and 
sharp  contrast,  resolution  far  in  excess  of  the  "fixed  lattice  system"  can  be 
obtained.  Note  that  there  is  a  trade-off  between  the  resolution  obtainable  in 
one  parameter  and  the  resolution  in  other  parameters;  we  cannot  simultaneously 
optimize  conditions  for  observing  position  and  velocity,  or  color  and  intensity, 
for  example.  An  interesting  analogy  can  be  drawn  to  the  limitations  on 
simultaneous  observation  of  related  variables  in  quantum  mechanics,  although 
there  is  no  reason  to  suppose  that  the  analogy  is  anything  other  than  coinci¬ 
dental  . 
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23,1.5 


"Mixed  Strategies"  and  the  Design  of  General  Purpose  Systems 


In  the  preceding  sections,  it  has  been  demonstrated  that  the  kind 
of  network  organization  which  is  best  suited  for  one  stimulus  environment  . 
or  discrimination  problem  may  be  far  from  optimum  for  a  different  problem. 
The  upright  and  upside-down  T's  ,  for  example,  might  best  be  discriminated 
by  a  specially  designed  T  -detector;  but  in  this  case  every  other  letter,  or 
combination  of  lines  which  might  be  encountered  would  have  to  have  its  own 
special  detector  mechanism,  and  the  system  would  be  useless  in  a  general 
environment.  Thus  the  question  arises,  if  we  know  only  the  general  character 
of  an  environment,  but  cannot  anticipate  all  discriminations  that  the  perceptron 
may  be  required  to  learn,  what  is  the  best  combination  of  stimulus  analyzing 
mechanisms  to  provide  a  good  "general  purpose"  system? 

This  problem  (on  which  no  real  analysis  has  been  done  to  date) 
seems  to  be  related,  at  least  superficially,  to  the  mixed  strategy  problem 
in  game  theory.  The  object  of  the  game  is  to  minimize  the  probability  that 
any  discrimination  problem  likely  to  arise  in  nature  will  be  insoluble,  subject 

to  constraints  on  the  size  of  the  system,  admissible  learning  times,  etc.  In 

i2) 

Equation  (23.4)  a  proposed  solution  for  the  distribution  of  A  fields  was 
presented,  for  the  special  case  in  which  the  stimulus  diameters  are  uniformly 
distributed.  A  more  general  solution  should  also  consider  the  best  mixture 
of  line -detectors ,  spot-detectors,  point-combination  detectors,  etc.,  among 
the  A  units,  the  number  of  layers  to  be  employed  and  the  distribution  of 
retinal  fields  among  them,  etc. 
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A  few  general  rules  seem  to  have  emerged  from  studies  thus 
far.  For  one  thing,  it  seems  to  be  inadvisable  to  seek  highly  specialized 
property  detectors  in  the  early  stages  of  the  network,  A  few  basic  types, 
such  as  line  and  boundary  detectors,  spot  detectors,  termination  detectors, 
and  movement  detectors  are  certainly  helpful,  and  yield  appreciable 
improvements  over  random-point  combinations.  But  higher-level  organizations 
seem  to  be  achieved  better  either  by  a  mixture  of  simple  properties  at  a 
greater  logical  depth  in  the  network  (as  in  the  five -layer  system  considered 
in  Section  23.2.2)  or  else  by  learning,  at  the  R-unit  level.  For  another 
thing,  the  extension  in  depth  of  a  heirarchical  retinal  field  system  is  useful 
for  a  limited  number  of  levels,  but  extension  much  beyond  three  association 
layers  seems  unlikely  to  improve  capabilities  appreciably  in  systems  the  size 
of  the  human  brain.  Recognition  problems  which  cannot  be  dealt  with  by  a 
five-layer  heirarchical  structure,  due  to  the  large  number  of  small  details 
which  must  be  considered  in  solving  the  problem,  are  best  handled  by  a 
sequential  system,  rather  than  by  continuing  to  increase  the  depth  of  the 
network , 


It  is  questionable  whether  analytic  procedures  will  be  able  to 
make  much  headway  in  dealing  with  this  problem,  although  a  combined  attack 
with  simulation  techniques  and  analysis  wherever  applicable  should  yield 
considerably  better  information  concerning  the  optimum  organization  for  a 
given  visual  universe. 
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23.2  Audio-Analyzing  Mechanisms 


The  sensory  analyzing  mechanisms  which  are  best  suited  to 
an  auditory  input  system  are  in  some  respects  similar  to  those  which  have 
been  considered  for  visual  inputs.  The  difference  in  character  of  typical 
auditory  patterns  (speech  in  particular),  where  temporal  organization  largely 
takes  the  place  of  spatial  organization,  leads  to  a  number  of  distinctive 
requirements.  The  following  sections  consider  several  of  these  special 
problems. 

23.2,1.  Fourier  Analysis  and  Parameter  Sampling 


In  principle,  a  number  of  possible  sensory  representations 
could  be  used  for  auditory  material,  including  the  continuous  measurement 
of  the  amplitude  of  a  waveform;  spectral  analysis,  with  the  amplitudes  given 
for  all  frequency  components  as  a  function  of  time;  and  various  "reduced 
information"  systems,  such  as  the  indication  of  zero -cros  sings ,  or  the 
outputs  of  special  filter  systems.  In  the  human  auditory  system,  phase 
information  appears  to  be  disregarded,  and  a  Fourier  analysis  into  spectral 
components  is  employed.  In  a  system  designed  to  simulate  human  perform¬ 
ance  in  speech  recognition,  musical  recognition,  and  related  problems,  a 
presentation  of  the  actual  waveform  would  burden  the  system  with  a  great 
deal  of  excessive  information.  The  same  word  spoken  with  slightly  different 
phase  relations  between  the  frequency  components,  for  e.xample,  would 
present  completely  different  wave -shapes,  which  the  perceptron  would 
have  to  learn  to  identify.  Thus  the  spectral  analysis  of  the  audio  input 
seems  preferable. 
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With  a  Fourier  analyzed  input,  the  important  sensory  parameters 
to  be  represented  by  an  S-point  are  the  frequency,  amplitude  (or  threshold), 
and  time  relative  to  the  present  (generally  represented  by  connection  delays). 
With  these  three  variables,  the  principle  of  independent  sampling  of  sensory 
parameters,  discussed  in  Section  23.1.4,  is  again  applicable.  If  the  system 
is  required  to  discriminate  100  frequencies,  100  time  delays,  and  100  ampli- 
tudes,  for'  example,  then  a  total  of  10^  frequency-threshold-delay 
combinations  would  be  required  with  a  "complete  lattice"  system.  Using 
independently  sampled  parameters,  on  the  other  hand,  a  system  with  only 
1000  S-units  could  discriminate  1000  frequencies  in  an  intense  sustained 
tone  or  mixture;  it  could  discriminate  1000  amplitude  levels  in  a  "white 
noise"  mixture  sustained  for  the  duration  of  the  maximum  time  delays;  or 
it  could  place  the  occurrence  of  an  intense  "pip"  of  white  noise  to  a 
precision  of  one  part  in  1000  in  time.  Under  less  optimum  conditions,  the 
accuracy  of  discrimination  in  separate  dimensions  would  be  reduced,  but 
the  composite  organization  could  still  be  discriminated  readily  from  an 
appreciably  different  organization. 

23.2,2  A  Phoneme -Analyzing  Perceptron 


An  introductory  discussion  of  the  phenomena  of  speech  per¬ 
ception  can  be  found  in  the  chapters  by  Licklider  and  Miller  in  Ref.  112. 
Perceptrons  for  speech  recognition  and  the  association  of  names  with 
objects  or  events  have  been  discussed  in  Section  21.3.  In  these  systems, 
it  is  assumed  that  a  complete  word  must  be  learned  as  a  primitive  pattern, 
without  preliminary  analysis  into  significant  sounds,  or  phonemes.  In 
this  section,  a  more  sophisticated  perceptron,  capable  of  phonemic  analysis, 
will  be  described. 


The  possible  improvement  in  efficiency  which  can  be  obtained 

by  analyzing  a  word  into  a  sequence  of  phonemes  can  be  highly  significant, 

If  we  consider  a  hypothetical  (and  rather  unnatural)  language  in  which  there 

are  100  allophones  (or  functionally  equivalent  sounds)  for  each  phoneme,  and 

a  word  of  five  phonemes  consists  of  an  independent  choice  of  one  of  the 

allophones  for  each  phoneme,  then  the  word  may  appear  in  any  one  of 
5  1 0 

100  =  10  possible  forms.  For  a  perceptron  with  a  high  degree  of 

sensitivity  to  differences  in  sound  patterns,  this  would  mean  that  the 
discrimination  of  two  words  would  require  an  enormous  number  of 
utterances  (perhaps  many  millions)  in  order  to  generalize  to  all  equivalent 
pronounciations  (allomorphs)  which  might  occur.  (In  actuality,  the 
correlation  between  choices  of  allophones  for  different  phonemes,  in 
ordinary  speech,  would  greatly  reduce  the  sample  size  required,  but  the 
e.xample  will  serve  for  illustrative  purposes.)  On  the  other  hand,  if  each 
phoneme  were  first  recognized  by  a  distinct  R-unit,  and  the  outputs  of  the 
R-units  taken  as  the  input  for  a  word  recognizing  perceptron,  this  second 
perceptron  would  receive  an  invariant  sequence  for  each  word,  and  in  prin¬ 
ciple  a  single  utterance  of  each  word  (morphene)  would  be  sufficient  for 
complete  generalization.  The  phoneme -recognizing  units  would  each  have 
to  distinguish  a  set  of  100  allophones  from  a  universe  of  500  (assuming  that 
only  five  phonemes  are  involved,  so  that  the  learning  at  this  level  might  be 
achieved  quite  readily. 

In  actuality,  the  recognition  of  a  phoneme  is  not  as  simple 
as  the  above  discussion  suggests,  since  a  single  speech  sound  cannot,  in 
general,  be  recognized  independently  of  its  conte.xt.  The  preceding  and 
subsequent  sounds  may  completely  alter  the  sound  of  a  vowel,  for  example. 
Thus  a  phoneme -recognizing  perceptron  must  itself  be  a  sequence - 
recognition  device,  rather  than  a  momentary-pattern  recognizer. 
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A  perceptron  which  appears  to  be  capable  of  analyzing  a 

sequence  of  words,  so  as  to  spontaneously  develop  an  internal  code  for  the 

phonemes  employed  is  illustrated  in  Figure  67.  It  is  a  five  layer  perceptron, 

with  variable  connections  between  the  and  A^''^  layers,  and  between 

the  layer  and  the  R -units.  The  a!^'^  layer  can  be  thought  of  as 

playing  the  role  of  "R-units"  for  the  first  three  layers  of  the  perceptron, 

and  will  eventually  learn  the  phoneme  code  to  be  employed.  At  the  same 

time,  it  serves  as  the  "sensory  system"  for  the  last  three  layers,  which 

(3J 

act  as  a  three-layer  perceptron  for  word-recognition.  The  A  system 
may  either  be  organized  as  a  cross -coupled  system,  or  its  input  connections 
may  be  given  a  spectrum  of  delays;  in  either  case,  it  is  capable  of 
recognizing  sequences  of  inputs,  rather  than  just  momentary  patterns.  If 
the  units  are  c ross -coupled  (particularly  with  inhibitory  connections) 

and  are  of  the  "flip-flop"  variety,  tending  to  remain  in  their  present  "on"  or 
"off"  state  until  receiving  a  super -threshold  signal,  then  the  a'  system 
will  tend  to  go  to  a  state  characteristic  of  the  sequence  of  input  patterns 
regardless  of  the  duration  of  the  individual  patterns  in  the  sequence.  This 
is  particularly  true  if  the  A'  *  system  goes  through  a  sequence  of  states 
(A,  B,  C,  .  .  .  )  where  each  state  is  "held"  without  variation  for  a  time 
greater  than  the  "settling -down  time"  of  the  system  (which  should 

normally  be  no  greater  than  two  or  three  transmission  delays,  for  the 
conditions  given).  Thus  a  "word"  encoded  into  a  sequence  of  phonemes 
by  the  units  would  lead  to  a  fixed  state  of  the  system  upon  its 

termination,  regardless  of  the  actual  duration  of  the  phonemes. 

This  effect,  as  well  as  some  of  the  others  discussed  in  this  section, 

might  be  employed  to  advantage  in  a  visual  system  which  is  required 
to  recognize  sequences  of  stimuli,  such  as  successively  presented 
letters  or  signals. 
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VALUED  CONNECTIONS.  NUMBERS  OF  A^')  AND  A^^) 
NUMBER  OF  a(2)uniTS. 


I  1 

The  reinforcement  rule  for  the  '  to  R-unit  connections  is  a 
conventional  o6  -system  rule,  so  that  an  error  correction  procedure  may 
be  employed  to  teach  the  system  to  recognize  words.  The  reinforcement 

i'  0  (2 ) 

rule  for  the  A'  to  A  '  connections,  however,  is  a  probabilistic  one, 
defined  as  follows: 

(I  I  ... 

1.  With  each  connection,  ,  from  an  A  to  an  /I  unit 

is  associated  a  time -dependent  probability,  P- •  (t)  ,  called  the  instability 

coefficient  of  the  connection. 

(I)  (z) 

2.  Reinforcement  at  the  preterminal  level  {A  to  A 
network)  is  applied  only  upon  the  decision  of  the  reinforcement  control  system, 
or  experimenter.  Otherwise,  the  values  of  these  connections  remain 
unchanged . 


3.  If  preterminal  reinforcement  is  applied  at  time  t  ,  all 

instability  coefficients  are  changed  by  the  amount  AP.-j  -  d' P- ; |  3  /J . 

If  no  reinforcement  is  applied  at  time  .t  ,  /W  •  •  -  -  6'P--(t) 

*  J 

4.  If  reinforcement  is  applied,  assume  that  the  current 
activity  states  of  all  A  '  units  are  "wrong",  and  apply  the  correction 

■  '  I  -  I  ^  ■  I '  I  with  probability  ^  .  (This  is  equivalent 

to  an  >  -system  error  correction  applied  probabilistically.) 

The  actual  training  procedure  can  best  be  described  in  terms 
of  the  following  experiment: 
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Assume  a  language,  L,  possessing  three  phonemes,  A,  B,  and 
C,  with  k  allomorphs  of  each  phoneme.  Time  is  quantized  in  units  At 
Each  phoneme  persists  for  a  duration  At  ,  unless  otherwise  indicated.  Let 
L  consists  of  the  six  words,  AB,  BA,  AC,  CA,  BC,  CB.  Assume  some 
output  code,  ^  (/lA  is  assigned  to  each  word,  .  Then  the  procedure  for 
training  the  perceptron  is  as  follows: 

Present  a  randomly  chosen  allomorph  of  the  first  word  (AB),  and 
observe  the  response  of  the  perceptron.  If  this  is  correct,  go  on  to  the  next 
word  (BA);  if  it  is  incor  rect,' -present  AB  again,  this' time  applying  (quahtize'd) 
error -correction  reinforcement  to  the  terminal  connections  (  '  '  to  R-units), 

Again  test  the  response  to  the  word  AB.  If  the  response  is  now  correct,  go 
on  to  the  next  word;  otherwise,  present  the  word  again,  this  time  reinforcing 
the  preterminal  network  (  to  /  connections)  and  leaving  the 

terminal  network  unaltered.  Then  apply  a  second  correction  to  the  terminal 
network,  and  retest  the  response  to  AB.  Continue  alternating  between 
reinforcements  applied  to  the  terminal  network  and  reinforcements  applied 
to  the  preterminal  network,  until  AB  elicits  the  correct  response.  Then  go 
on  to  the  next  word  (BA)  and  repeat  the  same  procedure.  Continue  cycling 
through  the  complete  vocabulary  until  all  words  have  been  learned  correctly. 

A  very  limited  amount  of  experimental  work  has  been  done  with 
this  system,  using  coin -tossing  experiments  and  pecil -and-paper  simulation 
techniques  to  investigate  performance  for  the  three -phoneme  language 
considered  above.  Note  that  in  this  experiment,  the  perceptron  is  never 
given  a  single  phoneme  in  isolation,  but  always  as  part  of  a  two-phoneme 
word.  Moreover,  the  perceptron  is  never  corrected  for  "mistakes"  in  a 
single  phoneme;  reinforcements  applied  to  the  preterminal  network  are 
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maintained  for  the  duration  of  an  entire  word,  regardless  of  whether  one  or 

both  phonemes  are  causing  the  difficulty.  Nonetheless,  it  is  found  that  as 

{:) 

long  as  the  number  of  d  units  is  greater  than  the  number  of  phonemes 
(  has  been  found  to  work  well),  the  system  tends  to  form  a 

phoneme-code  at  the  d  ”  level;  i.  e.  ,  after  a  period  of  training,  each 

f.v 

phoneme  (A,  B,  and  C)  activates  a  different  set  of  A'  units,  and  all  alio- 

'  ,  ) 

phones  of  a  given  phoneme  tend  to  activate  the  identical  set  of  A  '  units. 

These  results  can  be  obtained  in  a  very  short  training  sequence 
(generally  less  than  one  complete  run  through  the  6-word  vocabulary)  with 
a  suitable  choice  of  the  parameters  f  and  d'  (which  determine  the  rate  of 
growth  and  decay  of  the  instability  coefficients,  i-’-  ■  ).  On  the  other  hand, 
no  deterministic  system  has  been  found  which  will  yield  comparable  results, 
although  something  like  a  dozen  alternatives  have  been  tried.  A  rough  heuris¬ 
tic  explanation  for  the  observed  effect  can  be  given  as  follows:  When  the  system 
arrives  at  some  state  in  which  the  activities  of  the  A  units  constitute  a  phoneme- 
code  for  the  language,  new  words  can  generally  be  learned  with  at  most  one  or  two 
reinforcements  of  the  terminal  network,  so  that  there  is  little  occasion  to  re¬ 
inforce  the  preterminal  connections.  Consequently,  the  instability  coefficients, 

;  ,  all  decay  towards  zero,  and  the  probability  of  disrupting  the  learned 
code,  even  if  a  reinforcement  of  the  terminal  network  does  fail  to  correct 
an  error,  is  negligible.  On  the  other  hand,  if  any  two  phonemes  are  assigned 
the  same  code,  there  will  be  repeated  confusions  of  words  which  can  only  be 
distinguished  by  means  of  the  undiscriminated  phonemes.  Consequently,  the 
preterminal  network  will  frequently  be  reinforced  for  words  containing  these 
phonemes,  but  not  for  other  words.  Therefore,  the  connections  originating 

ff) 

from  A  units  which  are  activated  by  one  of  the  conflicting  phonemes  will  tend 
to  acquire  large  instability  coefficients,,  leading  eventually  to  the  modification 
of  the  A  responses  to  these  phonemes.  But  since  the  corrections  are  applied 

I  ' 

probabilistically,  the  system  will  tend  to  try  out  arbitrary  A  '  codes,  and  is 
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thus  immune  to  "trapping"  cycles,  which  tend  to  occur  in  deterministic 
models.  In  brief,  the  effect  of  the  instability  coefficients  is  to  make  those 
connections  most  suspectible  to  change  which  are  most  troublesome  to  the 
system. 


It  remains  to  be  seen  why  the  system  tends  to  assign  the  same 
•i  code  to  all  allophones  of  a  given  phoneme,  rather  than  merely  making 
up  totally  unique  codes  for  every  input  pattern.  In  part,  this  is  helped  by 
keeping  the  number  of  *1  units  small,  so  that  conflicts  are  likely  to  arise 
if  the  code  is  not  an  econom.ical  one.  The  main  effect,  however,  is  due  to  the 
fact  that  different  allophones  of  a  given  speech  sound  are  not  arbitrary, 
independent  patterns,  but  tend  to  be  highly  correlated  in  the  frequency-time- 
amplitude  picture  which  comes  from  the  sensory  system.  Thus  the  condi¬ 
tions  are  ideally  suited  for  generalization  from  one  allophone  to  nearly 
identical  sounds,  from  there  to  next-nearest  neighbors,  etc.  In  fact,  the 
tendency  would  be  to  classify  all  sounds  identically  (due  to  positive  O;; 
coefficients  in  an  ■'c-system)  were  it  not  for  the  intervention  of  the  experimenter 
or  r.  c.  s.  ,  which  forces  the  separation  of  significantly  different  sound  patterns. 
The  spontaneous  clustering  of  "similar"  sounds  can  be  compared  to  the 
spontaneous  clustering  of  "similar"  visual  stimuli  di{3cussed  in  Section  7.  3, 
and  demonstrated  for  a  '■'-system  in  Experiment  9  (Page  Zl-r). 

'A  ’ 

By  adding  fixed-  back-connections  from  the  ■■  to  the  •  units  in 

the  perceptron  of  Figure  67,  the  recognition  of  individual  phonemes  may  be 
more  readily  influenced  by  the  preceding  sequence.  Alternatively,  variable- 
valued  back-connections  from  -1  to  •  units  might  be  conditioned,  by  a 
suitable  training  procedure,  to  provide  a  bias  to  the  A  units,  tending  to  favor 
the  recognition  of  the  most  probable  next  phoneme,  as  determined  by  the 
prior  sequence. 
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While  the  above  discussion  has  concentrated  on  demonstrating  the 
possibility  of  a  self -organizing  mechanism  for  phoneme  analysis,  it  is  also 
possible  to  employ  a  somewhat  simpler  version  of  the  five-layer  system  in 

which  the  A  units  are  actually  trained  by  the  experimenter  to  emit  a  chosen 

■J) 

code  for  each  phoneme.  In  this  case,  the  ^  units  are  actually  R-units,  and 
the  probabilistic  reinforcement  rule  for  the  pre -terminal  network  is  no  longer 
necessary,  an  ordinary  -system  error  correction  procedure  being  perfectly 
suitable.  One  might  also  consider  the  possibility  of  extending  the  five-layer 
system  in  depth,  by  adding  another  A-unit  layer  and  terminal  R-layer  after  the 
last  layer  of  the  present  model.  By  reinforcing  first  the  te'  .linal  connections, 

' !  f  I' 

then  the  A'  outputs,  and  finally  the  A  outputs  (in  case  of  failure  to  correct  the 
mistake  at  the  terminal  level),  the  system  might  be  expected  to  develop  a 
phoneme  code  in  the  initial  part  of  the  network,  a  syllable  code  in  the  middle, 
and  a  code  for  complete  words  or  phrases  at  the  level  of  the  final  R-units. 

23.2,3  Melodic  Bias  in  a  Cross -Coupled  Audio-Perceptron 

The  final  stimulus  analyzing  mechanism  to  be  considered  is  one 
which  seems  likely  to  occur  spontaneously  in  cross -coupled  perceptrons  (of 
the  type  analyzed  in  Chapter  19)  with  audio-inputs.  Suppose  such  a  perceptron 
is  exposed  to  a  random  sequence  of  notes,  covering  a  range  of  several  octaves, 
and  played  by  a  variety  of  string  and  wind  instruments.  Each  note  is  held  long 
enough  for  the  cross-connections  of  the  association  system  to  be  reinforced, 
before  the  next  note  is  sounded.  Then,  assuming  that  the  input  comes  from  a 
Fourier  analyzing  system,  the  fundamental  will  be  associated  most  strongly 
to  the  major  overtones  of  the  sequences  characterizing  the  instruments  employed. 
Thus  the  main  association  will  generally  be  to  the  octave  above  or  below,  next 
to  intervals  of  a  major  fourth  and  fifth,  etc.  This  means  that  the  main 
harmonic  intervals  of  a  twelve-tone  scale  v/ill  tend  to  predominate,  rather 
than  purely  random  frequency  associations. 
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Such  a  system  will  tend  to  respond  most  unambiguously  to  chords 
and  combinations  of  notes  bearing  a  simple  harmonic  relation  to  one  another 
(e.  g.  ,  major  fifths,  fourths,  and  octaves)  while  strongly  discordant  combin¬ 
ations  will  tend  to  create  a  conflict  (particularly  in  a  /’-system)  such  that 
the  system  tends  to  oscillate  between  several  alternative  and  mutually 
competitive  activity  states. 

By  adding  variable-valued  back-connections  from  R-units  to  A-units 
(as  in  Figure  60),  and  associating  a  different  response  to  each  fundamental 
tone,  the  perceptron  can  be  made  to  emit  responses  corresponding  to  a 
melodic  sequence,  if  each  response  in  turn  is  suppressed  shortly  after  it  is 
turned  on.  Such  a  perceptron,  preconditioned  as  above,  will  tend  to  pick  a 
harmonically  consistent  sequence,  probably  avoiding  major  shifts  in  tonality 
except  by  means  of  gradual  progressions. 

These  observations,  although  suggestive,  should  not  be  over¬ 
interpreted.  It  seems  plausible  that  melodic  and  harmonic  biases  in  music 
have  a  fundamental  basis  in  the  overtone  series  (as  Hindemith  has  suggested); 
however,  the  ease  of  vocal  transition  from  one  note  to  the  next,  and  other 
considerations  which  play  no  part  in  the  above  model,  are  undoubtedly  of  equal 
importance  in  the  determination  of  musical  traditions  and  the  conditioning  of 
musical  perception. 
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24.  PERCEPTION  OF  FIGURAL  UNITY 


In  almost  all  tests  of  perceptron  performance  considered  in 
previous  chapters,  the  enviromnent,  or  stimulus  world,  was  assumed  to 
consist  of  discrete  objects,  or  events,  occurring  one  at  a  time  in  an  ordered 
sequence.  The  actual  physical  environment  which  we  experience  on  a  day-to- 
day  basis  is  not  of  this  form;  the  visual  field,  in  particular,  is  likely  to  contain 
a  large  number  of  different  objects,  patterns,  or  constellations  of  objects 
simultaneously.  In  human  perception,  it  is  easy  to  detect  and  name  familiar 
objects  in  an  unfamiliar  scene,  such  as  a  landscape  or  a  strange  room.  For  a 
perceptron,  each  such  combination  of  objects  represents  a  new  "composite" 
stimulus.  If  the  composite  organization  consists  of  familiar  patterns  which 
have  previously  been  learned  in  isolation,  then  it  has  been  demonstrated  that 
the  perceptron  may  attend  selectively  to  one  object  or  pattern,  and  respond 
consistently  to  this  object  (see  Chapter  21).  For  the  human  observer,  however, 
it  is  not  necessary  for  the  individual  objects  or  component  patterns  in  the  field 
to  have  been  previously  learned  individually;  totally  new  and  unfamiliar  organi¬ 
zations  may  nonetheless  be  perceived  as  "objects".  Other  organizations,  no 
matter  how  familiar,  will  always  be  perceived  as  sets  of  objects,  rather  than 
as  single  entities. 

The  organization  of  a  complex  field  into  "objects"  or  distinct  entities 
is  frequently  ambiguous,  in  that  the  field  permits  many  alternative  constructions 
or  organizations  of  " meaningful  parts".  Problems  of  reversible  perspective, 
the  interpretation  of  Rorschach  ink  blots,  or  the  detection  of  alphabetic  charac¬ 
ters  in  collections  of  random  lines,  all  serve  to  indicate  this  ambiguity.  The 
recognition  of  an  "object"  in  the  human  perceptual  process  is  generally  experi¬ 
enced  as  a  figure-ground  organization,  in  which  the  object  emerges  as  "figure" 
while  the  rest  of  the  field  serves  as  "ground".  Hebb,  who  holds  the  segregation 
of  figural  patterns  to  be  an  innate  process,  has  proposed  the  term  "primitive 
unity"  for  such  figural  entities  (Ref.  33).  The  perception  of  such  unity  is  clearly 
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essential  for  an  organism  which  must  move  about  and  interact  with  the  objects 
of  its  environment.  It  applies  not  only  to  spatial  organizations  but  to  temporal 
sequences  as  well;  a  sequence  of  human  movements  is  broken  up,  perceptually, 
into  acts,  steps,  or  gestures,  while  speech  or  music  is  divided  into  words  or 
phrases,  even  if  the  sequence  of  sounds  is  an  unfamiliar  one. 

The  Gestalt  psychologists  have  considered  the  problem  of  figural 
unity  from  the  standpoint  of  what  constitutes  a  "good  figure"  (c.f.  ,  Ref.  44). 

It  is  assumed  that  certain  organizational  properties  of  the  stimulus  field  lead 
to  a  preference  for  one  figural  organization  rather  than  another,  and  considerable 
experimental  data  have  been  gathered  on  the  influence  of  such  factors  as  contrasty 
boundedness,  connectedness,  and  the  like.  There  is  no  doubt  that  all  of  these 
factors  are  important  determinants  of  figure -organization  in  human  perception. 
For  present  purposes,  however,  we  will  attempt  to  work  with  the  hypothesis  that 
what  is  most  readily  seen  as  a  figural  entity  in  a  given  environment  tends  to  be 
an  organization  which  is  likely  to  undergo  a  continuous  transformation  in  thaU 
environment  (e.  g.  ,  a  detachable  rigid  object,  or  surface  bounded  by  discontinu¬ 
ities).  Whether  the  patterns  which  are  most  likely  to  be  operated  on  by  a 
continuous  transformation  are  learned  or  innately  recognized  is  left  open,  for 
the  time  being;  it  seems  likely  that  both  innate  and  acquired  biases  are  at  work 
in  human  vision. 

Posing  the  problem  in  this  form  suggests  that  the  system  must  be 
sensitive  to  cues  indicating  rigid,  moveable  objects,  or  surfaces  (such  as  the 
faces  of  a  cube)  whose  two-dimensional  projections  may  undergo  transformations 
which  are  discontinuous  at  their  boundaries  (i.  e.  ,  the  object  moves,  but  adjoining 
regions  of  the  field  do  not,  or  undergo  a  different  kind  of  motion).  The  attempt 
to  define  figural  objects  as  connected  blobs  of  uniform  illumination  (as  has  been 
advocated  in  several  computer  programs)  seems  quite  inapplicable,  except  under 
highly  contrived  and  artificial  conditions.  It  seems  likely  that  in  actuality,  a 
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combination  of  many  different  cues  of  "good  figure"  are  at  work  simultaneously, 
the  final  organization  being  arrived  at  by  an  active  process,  typically  involving 
a  good  deal  of  trial  and  error  before  a  good  "fit"  is  obtained. 

The  cues  which  are  suggested  by  psychological  experiments  as  being 
influential  in  the  determination  of  figural  organization,  or  the  perception  of 
separate  entities,  include  the  following: 

1)  Differential  motion  of  textured  or  bounded  regions,  or  sets  of 
points  in  the  retinal  field. 

Z)  Cues  indicating  differential  distance  or  "depth"  of  surfaces,  or 
sets  of  points. 

3)  Differential  surface  properties  in  a  bounded  region  (e.  g,  ,  color, 
te.xture,  or  type  of  fine-structure). 

4)  Contours,  boundaries,  or  discontinuities  in  surface  gradients. 

5)  Object  familiarity. 

These  five  types  of  information  are  listed  in  approximate  order  of  their 
strength,  or  dominance.  If  two  constellations  of  points  in  a  visual  field  are 
seen  in  relative  motion,  then  even  if  they  are  intermixed  spatially,  they  will 
tend  to  be  seen  as  distinct  objects,  and  the  observer  will  have  difficulty  attend¬ 
ing  to  both  simultaneously.  This  is  illustrated  by  the  view  of  a  moving  scene 
outside  a  dust-streaked  train  window:  either  the  window  or  the  outside  scene 
can  be  viewed  as  an  object,  but  not  both  in  combination.  An  experiment  by 
Gibson  employs  motion  pictures  of  talcum  powder  scattered  on  two  glass  plates, 
one  behind  the  other.  As  long  as  both  plates  are  stationary,  or  both  moved 
jointly,  the  two  planes  cannot  be  separated;  as  soon  as  differential  motion  is 
introduced,  however,  the  picture  breaks  up  unmistakeably  into  two  planes, 
each  with  its  own  distribution  of  spots. 
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The  relationship  of  depth  to  figure  organization  is  well  known,  and 
suggests  that  an  attack  on  problems  of  depth  perception  in  perceptrons  will  also 
contribute  a  great  deal  to  the  figural  unity  problem.  The  remaining  cues 
(contrasting  surface  areas,  boundaries,  and  familiar  object  recognition)  are 
the  ones  most  generally  incorporated  in  attempts  at  devising  computer  programs 
or  nerve -net  models  for  figure  segregation.  It  should  be  noted  that  the  last  of 
these  (object  familiarity)  is  the  only  one  demonstrated  as  workable  in  perceptrons  ■ 
up  to  this  point,  in  the  selective  attention  mechanisms  of  Chapter  21;  nonetheless, 
this  mechanism  is  only  useable  under  relatively  ideal  conditions,  in  which  objects 
are  present  without  overlap,  confusing  lines,  spots,  or  ''camouflage'',  and  where 
it  can  be  assumed  that  a  pattern  which  contains  the  sensory  components  of  a 
familiar  object  actually  represents  the  object,  rather  than  a  random  concatin- 
ation  of  lines  or  points  of  illumination. 

In  order  to  evaluate  the  performance  of  a  perceptron  in  the  realm  of 
figural  organization,  or  the  "perception  of  unity",  a  suitable  set  of  criterion 
experiments  must  be  defined.  This  proves  much  more  difficult  than  in  the 
testing  of  discrimination  capabilities,  or  the  study  of  generalization  over  a  given 
group  of  transformations.  In  the  simplest  case,  we  may  require  a  decision  as  to 
presence  or  absence  of  a  figure  in  a  noisy  field.  In  this  case,  the  detection 
experiments  discussed  in  Sections  7.4  and  8.4  may  be  employed,  with  little 
ambiguity.  In  the  case  of  organized  environments,  however  (c.f.  ,  illustration 
in  Section  8,  4)  it  is  frequently  difficult  to  decide  on  an  a  priori  basis  that  a 
particular  decision  is  "right"  or  "wrong".  If  the  field  is  sufficiently  ambiguous, 
or  the  context  is  not  indicated,  a  particular  pattern  of  lines  might  represent  the 
letter  "E"  or  a  random  pattern  of  cracks  on  concrete.  To  evaluate  performance 
on  such  material,  it  may  be  helpful  to  run  the  same  experiment  with  human 
subjects,  to  provide  an  arbitrary  standard  for  comparison.  The  results,  however, 
are  always  subject  to  interpretation,  based  on  the  intentions,  experience,  and 
additional  information  available  to  the  human  observers  in  contrast  to  the 
perceptron. 
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Three  types  of  criterion  experiments  seem  possible: 

1)  Description  of  the  figure  by  a  multi-response  perceptron  (e.  g.  , 

"small  right  triangle  in  upper  left,  with  cross-hatched  surface"). 

2)  Detection  of  familiar  objects;  perceptron  is  trained  to  tell 

whether  object  is  present  or  absent.  -  ■ 

3)  "Test-point  experiments"  where  the  perceptron  can  attend  selec¬ 
tively  to  a  test-point,  or  the  end  of  a  pointer  placed  in  the  field, 
and  tell  whether  or  not  the  point  is  in  contact  with  the  figure. 

In  this  way,  a  description  of  the  figure  can  be  obtained  by  trac¬ 
ing  its  boundaries,  or  obtaining  an  inventory  of  its  parts. 

Little  work  has  been  done,  to  date,  to  determine  the  capabilities  of 
cross -coupled  and  back-coupled  perceptrons  in  experiments  of  these  types. 

The  detection  experiment  is  the  one  most  readily  performed  v/ith  the  systems 
analyzed  to  date,  and  it  is  hoped  that  some  data  can  be  obtained  in  the  near 
future.  Series -coupled  perceptrons  appear  to  offer  little  hope  of  good  perform¬ 
ance  in  these  problems. 

Cross -coupled  perceptrons  have  been  observed  to  form  mutually 
exclusive  "cell  assemblies"  in  their  association  systems,  under  the  spontaneous 
organization  rules  considered  in  Chapter  19.  It  is  possible  that  with  a  suitable 
choice  of  preconditioning  sequence  and  network  parameters,  such  cell  assemblies 
may  be  related  to  figural  organizations,  so  that  when  two  or  more  rival  figure - 
ground  organizations  are  present,  the  A-units  activated  will  correspond  to  one 
of  these  organizations  in  preference  to  the  others.  At  present,  however,  this 
conjecture  must  be  regarded  as  pure  speculation,  with  no  real  evidence  to 
support  it. 


The  introduction  of  back-coupling,  however,  does  permit  the  percep¬ 
tron  to  take  advantage  of  the  first  and  most  powerful  cue  as  to  figural  organization, 
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namely,  differential  motion.  A  suitable  organization  is  illustrated  in  Figure  68. 
The  perceptron  is  a  three-layer  system  with  multiple  R-units,  of  the  "on-off" 
variety.  Each  R-unit  is  trained  to  respond  to  a  different  motion,  or  transform- 


UP" 

OOVUN” 

RIGHT" 

LEFT" 


Figure  68  A  PERCEPTRON  FOR  FIGURAL  SEPARATION  OF  MOVING  PATTERNS. 


ation.  The  variable  connections  from  A  to  R-units  and  from  R  to  A-units  are 
reinforced  as  in  Chapter  21,  for  selective  attention  systems  with  variable  back- 
coupling.  Due  to  the  spectrum  of  time  delays,  the  A-units  respond  directly  to 
the  movement  pattern  as  well  as  the  shape  of  the  stimulus.  The  system  may  be 
further  improved  by  adding  inhibitory  interconnections  between  the  R-units,  so 
that  only  one  can  go  on  at  a  time.  If  there  should  be  two  stimulus  patterns 
simultaneously  present  on  the  retina,  moving  in  opposite  directions  (or  one 
moving  and  the  other  stationary),  the  dominant  response  will  tend  to  support 
those  A-units  responding  to  the  stimulus  whose  motion  corresponds  to  the  R- 
unit,  and  will  suppress  the  A-units  responding  to  the  second  stimulus.  The 
threshold  servo  plays  the  same  role  as  in  the  systems  of  Chapter  21.  If  the 
A-system  is  cross -coupled,  with  a  -system  rule,  the  effect  will  be  supported 
by  the  formation  of  "cell  assemblies"  characterizing  different  directions  or 
velocities  of  motion. 


-554  - 


As  the  stimulus  field  becomes  increasingly  ambiguous  in  its 
organization  (as  in  ink-blot  patterns,  for  example)  the  field  organization  which 
results  in  a  human  observer  depends  less  on  a  passive  response  to  automatic 
mechanisms,  and  more  on  an  active  "construction"  of  a  meaningful  figure.  In 
this  process,  a  number  of  alternatives  may  be  reviewed  in  quick  succession, 
before  one  of  them  "settles  in",  and  the  field  loses  its  ambiguity.  This  sort 
of  activ'e  structuring  of  the  field  may  also  be  possible  for  a  perc  eptron  with 
feedback  loops  from  the  R-units,  if  the  perceptron  can  evaluate  the  strength, 
or  decisiveness  of  its  response,  and  actively  perturb  its  response  state  (and 
hence  the  feedback  signals  to  the  A-units)  until  a  strong,  persistent  response  is 
obtained.  This  may  be  done  by  adding  random  Gaussian  noise  signals  to  the 
inputs  of  the  R-units,  resulting  in  frequent  changes  in  the  response  state  as  long 
as  the  signals  from  the  A-system  are  weak  and  indecisive. 

While  the  above  discussion  indicates  several  possibilities  which  are 
open  to  experimental  treatment,  it  is  clear  that  much  fundamental  groundwork 
remains  to  be  completed  before  the  problem  of  figural  unity  can  be  attacked  in 
a  systematic  manner.  At  the  present  time,  this  problem  remains  one  of  the 
most  severe  challenges  to  all  theories  of  brain  mechanisms. 
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25.  VARIABLE-STRUCTURE  PERCEPTRONS 


All  of  the  memory  mechanisms  employed  in  previous  chapters 
employ  a  fixed  network  structure,  in  which  the  weights  of  connections  are 
variable.  It  is  occasionally  proposed  that  a  system  in  which  the  structure  of 
the  network  itself  is  modifiable,  with  new  connections  being  formed  and  old  ones 
discarded  on  the  basis  of  demonstrated  utility,  might  lead  to  the  evolution  of  a 
better  model,  with  a  smaller  number  of  logical  elements  than  would  be  possible 
for  a  fixed-structure  perceptron  with  random  connections.  This  might,  for 
example,  be  a  way  of  evolving  special-purpose  stimulus  analyzing  mechanisms 
of  a  high  degree  of  utility  for  a  paiticular  environment.  A  model  in  which 
structural  modification  is  possible  --  i.e.,  in  which  the  origins  or  termini  of 
connections  are  changed  as  a  result  of  activity  --  has  previously  been  referred 
to  as  an  "evolutionary  model".  Apart  from  the  possibility  that  such  a  system 
might  provide  a  useful  memory  mechanism,  or  adaptive  niechanism,  it  has  been 
suggested  that  by  observing  the  terminal  states  to  which  such  a  model  goes, 
after  long  exposure  to  an  environment,  we  might  learn  something  about  the 
kinds  of  physical  constraints  which  could  be  usefully  built:  into  future  systems 
at  the  outset. 

25.  1  Structural  Modification  of  S-A  Networks 

To  date,  very  little  work  has  been  done  with  evolutionary  systems. 
Several  examples  have  been  programmed  for  the  IBM  704,  which  indicate  a 
slight  improvement  in  some  cases,  but  these  programs  have  proven  too  costly 
in  computer  time  to  permit  extensive  experimentation.  The  cases  illustrated 
here  come  from  this  group  of  pilot  experiments.  A  three -layer  perceptron 
with  a  single  R-unit  was  employed,  and  an  >  -system  error  correction  method 
was  used  for  reinforcing  the  terminal  network. 

The  programs  were  written  by  Kesler,  and  carried  out  at  the  AEC/NYU 
Computing  Center. 

I 

I 
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The  rules  for  changing  the  structure  of  the  network  are  closely 
analogous  to  those  employed  for  perceptrons  with  variable  S-A  connections, 
in  Chapter  13.  Each  A-unit,  a i  ,  is  continuously  evaluated  by  means  of  a  utility 
measure,  £;  .  If  the  current  response  r*"  is  wrong,  T/  may  be  increased  by  1 
with  probability  p  ,  ,  or  p  ,  defined  as  follows; 

p^  -  probability  of  incrementing  1:  ■  if  the  sign  of  disagrees 
with  the  desired  classification  of  the  current  stimulus,  and 
.  /  is  active  . 

/■ ,  =  probability~of  incrementing  /-'■  if  the  sign  of  'ir- ^  agrees  with 

the  desired  classification  of  the  current  stimulus,  and  a ^ 
is  inactive. 

-  probability  of  incrementing  if  the  sign  of  'w- disagrees 
with  the  desired  classification,  and  a-  is  inactive. 

The  quantities  /  ■  arc  assumed  to  decay  by  an  amount  riT-. j  at  each 
stimulus  presentation  time.  If  h  ■  reaches  or  exceeds  a  threshold  level,  , 
the  origins  of  all  connections  to  unit  (/■  are  reassigned,  and  £'■  is  reset  to  zero. 
In  most  experiments,  /■; ,  ,  so  that  an  A-unit  is  most  likely  to  have  its 

connections  changed  if  the  value  of  its  output  signal  frequently  disagrees  in 
sign  with  the  intended  classification  of  the  stimulus  which  activated  the  unit. 

The  results  of  several  experiments  (on  horizontal/ vertical  bar 
discrimination)  are  shown  in  Figures  69  and  70,  with  the  perfoi'mance  curves 
for  the  corresponding  fixed -structure  models  shown  for  comparison.  While 
there  seems  to  be  a  slight  advantage  for  the  variable -structure  systems 
(particularly  in  Figure  69,  where  only  20  A-units  were  used),  the  improvement 
over  the  fixed-structure  system  is  not  impressive.  Nonetheless,  it  is  possible 
that  a  more  sophisticated  procedure  for  determining  which  A-units  are  to  be 
changed  would  produce  belter  results.  It  also  seems  likely  that  the  horizontal/ 
vertical  bar  problem,  which  is  not  very  demanding  in  the  geometry  of  origin 
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Figure  69  EVOLUTIONARY  MODEL,  IN  HORI ZONTAL/VERTI CAL  BAR  DISCRIMINATION.  MEANS 
OF  10  PERCEPTRONS.  50  A-UNITS,  8,  y  =  2,  0=  3, 

=  .9,  .3,  />  =  .01  . 


Figure  70  EVOLUTIONARY  MODEL,  IN  HORI ZONTAL/VERTI CAL  BAR  DISCRIMINATION.  MEANS 
OF  10  PERCEPTRONS,  ZERO  RESP-ONSES  COUNTED  WRONG.  20  A-UNITS,  X  =  8, 
y  =  2,  0  =  3,  3,  P,  =  .d,  .Z,  P3=  .0\  . 
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configurations  required  for  discrimination,  may  be  a  poor  choice  of  a  calibration 
experiment  for  evaluating  the  evolutionary  model  Unfortunately,  the  procedure 
is  so  time-consuming  for  a  digital  computer  that  only  a  small  number  of  experi¬ 
ments  have  proved  feasible. 

As  a  memory  process,  the  above  system  seems  excessively  compli¬ 
cated,  Not  only  are  three  distinct  probabilities  required,  under  three  sets  of 
logical  conditions,  but  the  t-  must  be  stored  as  an  auxiliary  variable  for  each 
A-unit.  This  is  clearly  implausible  for  a  biological  mechanism.  The  difficulties 
encountered  seem  to  be  common  with  those  met  in  all  attempts  at  providing  a 
useful  memory  process  which  operates  on  the  preterminal  connections  of  the 
network  (as  in  the  variable  S-A  systems  of  Chapter  13).  It  is  hard  to  see  what 
simple  criterion  might  be  employed  to  identify  those  connections  w'hich  should  be 
changed  in  order  to  improve  the  final  output  of  the  R-units.  It  seems  likely  that 
a  local  information  rule  (Page  289)  is  incompatible  with  an  efficient  system  of 
reinforcement  at  the  preterminal  levels  of  the  network.*'' 

25.  2  Systems  with  Make-Break  Mechanisms  for  Synaptic  Junctions 

A  somewhat  different  kind  of  structural  modification  from  the  model 
described  above  is  that  in  which  there  is  a  fixed  set  of  "potential  connections" 
to  each  unit,  but  these  connections  may  be  either  "made"  or  "broken"  on  an 
all  -or-nothing  basis,  in  the  manner  of  switches  or  mechanical  relays,  A 
possible  application  of  such  a  mechanism  to  the  terminal  network  of  a  three- 
layer  perceptron  is  illustrated  in  Figure  71.  The  A-units  arc  divided  into  a 
set  of  excitatory  units  (E-units)  whose  output  is  always  positive,  and  a  set  of 
inhibitory  units  (I-units)  whose  output  is  always  negative.  All  signals  are  of 
unit  amplitude,  and  the  connections  from  I-units  to  the  R-unit  are  fixed,  only 
the  E-unit  connections  being  modifiable.  The  connections  from  E-units  to  the 
R-unit  are  of  the  make -break  variety,  the  reinforcement  rule  being  as  follows; 

>r 

There  is  some  hope,  howev(;r,  lliat  tlie  "elastic  perturbation"  system  suggested 
in  Section  26.9  will  prove  applicable  to  tliis  iiroblem. 


The  reinforcement  control  system  can  call  for  a  A-ir>0  or  for 
/\'ir  <-  ‘J  .  If  a  positive  increment  is  required,  excitatory  connections  witli  active 
origins  are  made  with  probability  P  (applied  independently  for  each  unconnected 
E-unit),  while  if  a  negative  /Iv  is  required,  excitatory  connections  with  active 
origins  are  broken  with  probability  P  .  If  the  system  begins  with  initial  conditions 
such  that  the  number  of  connected  E-units  just  balances  the  number  of  connected 
l-'units,  and  if  the  number  of  units  is  very  large,  the  effect  of  a  single  reinforce¬ 
ment  will  be  identical  to  the  application  of  a  quantized  -system  reinforcement 
to  a  system  with  fixed  A-R  connections.  Thus,  under  the  error  correction 
procedure,  this  system  can  be  expected  to  duplicate  the  performance  of  an 
-system  perceptron  quite  closely,  provided  the  number  of  A  -units  is  large. 


An  alternative  system  is  one  with  equal  numbers  of  E  and  1-units, 
in  which  the  I-connections  are  also  variable.  In  this  case,  new  connections  can 
be  made,  but  once  established  are  assumed  to  be  permanent.  For  Air  >0,  new 
E-connections  are  formed  with  probability  P  ,  as  above.  For  AtrcO,  however, 
new  I-connections  are  formed  with  probability  P  ,  instead  of  breaking  E-connec  ¬ 
tions.  At  the  outset,  assuming  that  all  A-units  are  initially  disconnected,  this 
system  again  behaves  in  much  the  same  way  as  an  ry  -system  perceptron.  As 
the  system  "saturates",  due  to  the  exhaustion  of  available  connections,  the 
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increments  to  the  R-unit  input  signal  from  each  new  reinforcement  become 
progressively  smaller.  If  the  number  of  A-units  is  infinite,  then  the  system 
never  saturates  entirely,  new  reinforcements  always  having  some  effect, 
although  this  is  apt  to  become  negligible  as  saturation  is  approached. 

These  models-  are  of  more  interest  as  possible  analogs  for  biological 
systems  than  as  significantly  new  types  of  perccptrons.  Their  properties, 
short  of  the  saturation  condition,  closely  resemble  the  systems  previously 
considered,  but  they  do  not  require  values  which  change  sign,  and  are  sugges¬ 
tive  of  a  possible  synaptic  growth  mechanism  in  biological  memory.  As  engineer¬ 
ing  devices,  their  reliance  on  probabilistic  mechanisms  is  apt  to  make  their 
construction  more  difficult  than  the  i'v;-system  models. 
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26.  BIOLOGICAL  APPLICATIONS  OF  PERCEPTRON  THEORY 

When  the  perceptron  was  first  proposed,  it  was  considered 
primarily  as  a  model  of  biological  memory  mechanisms.  As  the  models 
became  more  sophisticated,  a  number  of  psychological  properties  not  directly 
related  to  memory  were  investigated,  but  the  main  emphasis,  as  a  biological 
model,  is  still  on  the  adaptive  mechanisms  employed,  and  the  recording  of 
past  experience.  In  this  chapter,  the  application  of  perceptron  theory  to 
biological  problems  will  be  considered  primarily  from  this  point  of  view. 

26.  1  Biological  Methods  for  the  Achievement  of  Complex  Structures 

The  biological  evidence  which  has  been  cited  repeatedly  throughout 
this  volume  indicates  that  highly  organized  structural  constraints  exist  in 
many  parts  of  the  nervous  system.  Apart  from  the  gross  anatomical  complexi¬ 
ty  of  the  brain,  the  mechanisms  of  optic  nerve  growth  and  regeneration,  the 
stimulus  analyzing  mechanisms  found  by  Lettvin  in  the  frog  and  by  Hubei  in  the 
cat,  and  the  better  known  mechanisms  of  motor  coordination  and  control 
indicate  that  organization  of  a  rather  involved  type  may  occur  even  in  the  fine 
structure  of  the  network.  In  perceptron  theory,  as  it  has  developed  to  date, 
most  emphasis  has  been  placed  on  learning  and  memory  as  a  means  of 
achieving  such  organization.  In  actuality,  a  number  of  alternative  procedures 
are  possible  for  the  creation  of  complex  networks,  satisfying  a  given  set  of 
logical  constraints.  These  include: 

1.  Logical  specification  (e.  g.  ,  let  the  i*^^  cell  of  the  k*’^^  row  be 
connected  to  tlie  i+lst  cell  of  the  k+.'^rd  row,  for  all  i  >  k). 

This  is  equivalent  to  an  exact  blueprint  of  the  network. 

2.  Natural  selection,  wherqby  the  useful  sub-networks  of  an 
originally  random  population  survive,  while  the  others  decay. 

3.  Simple  spatial  constraints  (gradients,  directional  bias,  or 
distributions  of  connections  specified  by  a  small  number  of 
parameters). 
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4.  Typological  constraints  (e.g.,  cells  of  Type  A  can  only  connect 
to  cells  of  Type  B  or  C,  where  cell  types  might  be  distinguished 
by  chemical  properties). 

Of  these  four  mechanisms,  only  the  last  three  seem  to  be  well 
suited  for  the  development  of  biological  nerve  nets.  The  first  mechanism, 
logical  specification  of  the  structure,  is  primarily  a  contrivance  of  engineer¬ 
ing,  which  is  well  suited  to  the  construction  of  computers,  but  which  seems  to 
have  no  clear  counterpart  in  known  mechanisms  of  growth  and  maturation. 

It  is  this  first  method  of  control,  however,  which  has  been  most  investigated 
in  studies  of  brain  mechanisms  during  the  last  few  decades  (e.  g.  ,  References 
17,  57,  71). 


In  specifying  the  initial  physical  form  of  the  networks  in  perc'eptron 
theory,  most  attention  has  been  given  to  the  third  alternative;  spatial  constraints 
of  a  simple  sort  have  been  employed  throughout.  In  the  last  chapter,  limited 
use  was  made  of  the  second  and  fourth  methods.  The  use  of  typological 
constraints  has  thus  far  been  used  mainly  to  distinguish  excitatory  from 
inhibitory  neurons  (Section  Z5.  Z),  but  it  seems  likely  that  its  use  is  relatively 
widespread  in  biological  systems.  In  particular,  Sperry's  work  on  neural 
maturation  and  fiber  regeneration,  and  Lettvin  and  Maturana  on  the  regener¬ 
ation  of  scrambled  connections  in  the  frog's  brain,  suggest  a  chemical  control 
or  "homing  mechanism"  of  remarkable  sensitivity. 

The  limited  experiments  performed  thus  far  on  "natural  selection" 
as  a  structural  control  mechanism  do  not  appear  particuarly  promising 
(Section  25.  1).  The  evolution  of  the  network  occurs  too  slowly,  and  is  too 
subject  to  disruption  and  instability  of  partially  achieved  organizations,  to  be 
useful  in  any  of  the  forms  examined  up  to  this  point.  It  remains  possible, 
however,  that  a  more  rapidly  converging  mechanism  may  be  found,  and  the 
field  remains  open  for  future  investigation.  Typological  constraints,  on  the 
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other  hand,  are  likely  to  come  into  their  own  with  the  investigation  of 
perceptrons  having  complex  mixtures  of  property  detectors,  and  other 
specialized  A-units,  all  deriving  their  connections  from  a  common  sensory 
field. 

Z6.  2  Basic  Types  of  Memory  Processes 

Perceptron  memory  mechanisms  have  all  taken  the  form  of 
modifications  of  the  signals  transmitted  across  synaptic  junctions.  There 
appear  to  be  at  least  two  basic  types  of  memory  dynamics  which  are  useful 
in  perceptrons.  The  first  is  a  system  in  which  values  remain  stable  unless 
action  is  taken  by  a  reinforcement  control  system,  based  upon  an  evaluation 
of  the  current  response  of  the  perceptron.  The  most  effective  method  actually 
investigated  for  this  purpose  has  been  the  *  -system,  with  an  error  correc¬ 
tion  procedure  for  modifying  the  values  of  A  to  R-unit  connections.  The 
second  type  of  memory  is  one  which  achievi's  stability  only  in  the  form  of  a 
dynamic  equilibrium  with  a  continuously  active  reinforcement  process  This 
second  system  does  not  depend  upon  evaluation  of  the  perceptron' s  output,  but 
maintains  a  continuous  state  of  adaptation  in  the  network,  based  only  on  local 
activity.  In  practice  it  seems  likely  that  a  decaying  <  ’  -system  will  prove 
to  be  the  best  of  the  systems  of  this  type  which  have  been  analyzed.  The 
first  type  of  mechanism  permits  the  system  to  learn  from  an  external 
"teacher",  or  by  reward  and  punishment  experienced  as  a  result  of  trial  and 
error  activity.  The  second  type  permits  the  perceptron  to  acquire  an  internal 
model  of  the  "similarity  strurlure"  of  its  environment,  as  defined  by  the  temporal 
relationships  of  moving  stimuli.  It  may  be  that  more  complex  forms  of  organi¬ 
zation  (such  as  the  recognition  of  connected  patterns,  or  Gestalten)  can  also 
be  achieved  by  means  of  dynamic  processes  of  the  second  type,  but  this  remains 
conjectural  at  this  time 
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While  it  is  certainly  conceivable  that  additional  basic  mechanisms 
may  be  required  to  perform  the  memory  tasks  of  a  complex  organism,  there 
seems  to  be  some  reason  to  believe  that  the  two  types  of  dynamics  character¬ 
ized  above  may  prove  sufficient  for  the  phenomena  of  "adaptive  behavior". 

The  first  variety  permits  the  system  to  be  "set"  passively  to  any  desired  state, 
which  will  then  be  retained  indefinitely.  Thus  any  form  of  permanent  learning 
can  be  handled,  in  principle,  by  such  a  system.  The  error  correction  theorems 
of  Chapters  5  and  10  seem  sufficient  to  demonstrate  this  assertion.  On  the 
other  hand,  any  spontaneous  modification  process  which  is  not  to  be  self- 
defeating  must  ultimately  achieve  some  sort  of  dynamic  equilibrium  with  the 
conditions  which  induce  the  change  in  state;  without  such  a  mechanism  (provided 
in  the  case  of  our  four-layer  and  cross -coupled  perceptrons  by  the  decay  term 
in  the  equations)  the  dynamic  range  of  the  memory  variables  must  ultimately 
be  exhausted,  and  the  system  will  saturate.  In  any  case,  a  mechanism  which 
is  to  serve  as  a  basis  for  generating  a  model  of  the  external  environment  must 
be  one  which  ultimately  approaches  a  stable  condition,  as  the  model  approaches 
a  true  representation  of  the  external  world.  Such  considerations  make  the 
second  mechanism  appear  to  be  a  natural  complement  to  the  first. 

Two  memory  functions  which  might  call  for  processes  of  a  different 
logical  character  are  the  serial  recording  of  experience  (in  the  manner  of  a 
tape  recorder  or  motion  picture  camera)  and  a  temporary  memoiy  for  data  which 
are  to  be  used  in  the  immediate  future  and  then  forgotten  (as  in  the  "memory" 
of  a  digital  computer).  For  the  second  of  these  plumomena,  it  is  likely  that  a 
dynamic  storage  mechanism,  such  as  pools  of  activity  or  reverberating  loops 
which  can  be  triggered  and  extinguished  by  a  suitable  control  system,  will  prove 
to  be  the  most  effective  storage  mechanism.  The  problem  of  serial  memory  is 
a  more  serious  one,  but  can  only  be  dealt  with  together  with  the  problem  of  selec 
live  recall  and  the  mechanisms  for  its  control. 


It  is  certain  that  in  a  simple  perceptron,  memories  are  not  tagged 
in  any  way  which  would  permit  their  serial  order  to  be  re-established  later. 

But  the  "memories"  stored  in  a  simple  perceptron  are  in  any  case  merely 
associative,  rather  than  substantive.  The  nature  of  substantive  memory  in 
humans  must  be  investigated  more  carefully  in  the  future.  While  k  seems 
unlikely  that  a  complete  image  or  state  of  the  association  system  is  stored,  it 
is  nonetheless  clear  that  a  great  deal  more  information  is  retained  than  is 
represented  by  a  simple  classification  of  an  experience  as  belonging  to  one  of 
n  categories.  One  alternative  is  that  of  storing  a  description  of  a  large  number 
of  characteristics  or  dimensions,  which  jointly  permit  the  reconstruction  of 
the  original  experience  by  the  active  creation  of  a  model,  or  image,  which 
approximates  the  original  state  of  the  association  system,  Among  the  charac¬ 
teristics  stored  would  be  such  time-tagging  information  as  the  location  in  which 
the  event  occurred,  the  time  of  day,  the  activity  that  the  subject  was  engaged  in, 
etc.  An  accumulation  of  such  cues  would  enable  a  suitable  search  process  to 
locate  the  experience  in  time,  and  to  associate  it  with  preceding  or  successive 
events  in  appropriate  order  (c.f.  ,  Reference  79,  Chapter  VIII).  In  any  case, 
it  seems  likely  that  substantive  recall  is  an  active,  creative  (or  recreative) 
process,  rather  than  merely  a  passive  reading-out  of  a  memory  bank. 

26.  3  Physical  Requirements  for  Biological  Memory  Mechanisms 

From  the  considerations  just  stated,  it  should  be  clear  that  not  one 
but  several  memory  mechanisms  are  likely  to  be  encountered  in  a  complex 
system.  Limiting  our  attention,  for  present  purposes,  to  the  two  basic  mechanisms 
which  have  been  studied  in  perceptrons,  what  can  we  say  as  to  the  probable 
physical  characteristics  of  the  memory  traces? 

First,  as  to  location;  it  appears  that  the  most  suitable  location  is  in  the 
connections,  or  synapses,  which  mediate  the  interaction  of  particular  pairs  of 
neurons.  Perceptrons  in  which  the  memory  trace  affects  an  entire  neuron  and 
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all  of  its  interactions  with  other  neurons  have  been  investigated  (Reference  79) 
but  this  has  invariably  involved  the  introduction  of  artificial  constraints  on  the 
topology  or  logic  of  the  network,  in  order  to  limit  the  effects  of  reinforcement 
to  the  desired  transmission  channels.  In  any  case,  systems  in  which  the  re¬ 
inforcement  is  specific  to  the  connections  appear  to  be  far  more  economical 
than  those  in  which  reinforcement  is  applied  to  an  entire  neuron,  or  A-unit. 

A  second  condition  is  that  the  memory  change  should  be  reversible. 

Both  the  externally  controlled  error -correction  procedure  and  the  fully  automatic 
memory  processes  of  the  c ros s -coupled  perceptrons  require  reversible  modifica¬ 
tions.  In  the  case  of  the  error-correction  procedure,  two  antagonistic  control 
mechanisms  seem  to  be  called  fop,  one  of  which  strengthens  the  excitatory  outputs 
of  active  A-units.  and  the  other  of  which  weakens  excitatory  outputs  or  strengthens 
inhibitory  outputs.  While  most  of  our  analyses  have  assumed  that  the  actual  sign 
of  the  value  of  a  connection  may  change  from  positive  to  negative,  this  is  clearly 
a  non-biological  artifact,  introduced  for  convenience  in  analysis.  The  same  effects 
could  be  achieved  by  a  system  in  which  half  of  the  connections  are  always  positive, 
and  half  are  always  negative  If  the  negative  connections  are  fixed  in  magnitude, 
then  only  the  excitatory  connections  need  be  modified,  yielding  a  net  positive 
signal  if  they  exceed  the  strength  of  the  fi.xed  inhibitory  component,  and  a  net 
negative  signal  if  they  fall  below  the  inhibitory  strength.  Alternatively,  the 
excitatory  connections  might  be  fixed,  and  the  inhibitory  connections  variable, 
or  each  type  might  be  variable  within  its  own  dynamic  range 

The  requirement  that  the  "strength"  or  value  of  a  connection  be 
modified  as  a  consequence  of  the  correlated  activity  of  both  terminal  units, 
rather  than  just  the  transmitting  unit,  appears  to  place  a  unique  condition  on  the 
memory  process.  Most  metabolic  processes  such  as  growth,  changes  in  cell 
chemistry,  etc.  ,  which  mig.it  be  involved  here  are  of  a  type  which  generally  depend 
only  upon  the  cell  in  which  the  change  occurs,  and  its  over-all  environment. 
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whereas  we  seem  to  retquire  a  two-factor  phenomenon,  which  depends  upon  the 
activity  of  two  specific  Cells.  This  writer  has  previously  stated  the  conjecture 
(Reference  83)  that  the  required  effect  might  be  obtained  if  the  production  of 
transmitter  substances  depended  upon  an  enzyme  or  catalyst  produced  in  the 
nucleoplasm  of  the  l;r an f> -synaptic  cell,  and  released  to  the  medium  when  that 
cell  is  stimulated  to  activity.  The  presynaptic  fibers  which  were  most  recently 
active,  being  in  a  heightened  metabolic  state,  would  then  be  in  the  most  favorable 
position  to  compete  for  the  limited  supply  of  this  catalyst,  which  would  then 
enable  them  to  produce  their  transmitter  substance  at  an  increased  rate  in  the 
future.  The  competition  for  metabolites  in  limited  supply  in  the  neighborhood 
of  a  particular  cell  body  would  tend  to  create  a  ‘"/f  -system,  in  which  the  most 
active  cells  would  gain  at  the  expense  of  the  inactive  ones.  Whether  this  is  a 
correct  description  of  the  mechanism  or  not,  some  type  of  symbiotic  relationship 
seems  to  be  demanded  between  the  presynaptic  fibers  and  the  post-synaptic  cell, 
in  order  to  provide  a  memory  mechanism  of  the  type  analyzed  in  Part  III  of  this 
volume. 


The  memory  mechanism  employed  for  err  or -correction  learning 
places  rather  different  demands  on  the  biological  system.  Here  the  reinforcement 
depends  not  so  much  on  the  correlation  of  activity  of  the  two  terminal  units,  as  on 
the  correlation  of  the  activity  of  the  transmitting  unit  with  the  decisions  of  the 
reinforcement  control  system.  It  is  conceivable  that  this  might  again  involve 
the  release  of  a  catalyst  in  the  neighborhood  of  the  active  connections,  but  in  this 
case  the  release  must  be  remotely  controlled  --  perhaps  through  glandular  action. 
In  one  respect  this  is  a  simpler  requirement,  conceptually,  than  the  former  case, 
where  the  activity  of  two  specific  cells  had  to  be  considered  for  each  connection 
which  might  be  reinforced.  In  the  present  case,  the  general  release  of  an 
excitatory  or  inhibitory  reinforcing  agent  from  a  central  source  would  appear  to 
be  sufficient;  the  recently  active  connections,  being  most  metabolically  active, 
would  tend  to  be  most  strongly  affected.  In  a  second  respect,  however,  this 
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mechanism  presents  a  new  problem  which  is  more  serious:  the  problem  of 
limiting  the  effect  of  reinforcement  to  the  specific  response  which  is  to  be 
corrected. 


It  was  demonstrated  in  Chapter  12  that  the  error  correction  procedure 
can  be  guaranteed  to  work  only  if  the  correction  is  limited  to  the  erroneous 
responses,  in  a  multiple  response  system.  To  achieve  this  condition  in  a  biologi¬ 
cal  system,  it  seems  that  a  mechanism  is  called  for  which  can  select  one  response, 
or  response  component,  at  a  time  as  a  candidate  for  reinforcement,  and  limit  the 
corrective  action  to  the  selected  locality.  In  dealing  with  motor  responses,  the 
topographical  mapping  of  the  motor  control  areas  of  the  cortex  is  likely  to  prove 
helpful  here,  particularly  if  we  adhere  to  the  hypothesis  that  the  memory  trace 
involves  the  release  of  a  chemical  agent  wlucli  affects  everything  in  its  neighbor- 
hood. 


The  proportional  decay  mechanism  which  is  required  for  the  "spontane¬ 
ous"  memory  process  is  probably  the  easiest  of  the  requirements  to  rationalize 
in  a  biological  model;  a  rhemical  mechanism,  in  particular,  would  tend  to  exhibit 
decay  at  a  rate  which  increases  with  the  concentration. 

At  present,  any  treatment  of  the  compatibility  of  perceptron  theory 
with  biological  memory  mechanisms  must  remain  entirely  speculative.  It  is 
to  be  hoped  that  as  additional  evidence  on  synaptic  transmission  and  neurochemis try 
comes  to  light,  it  can  be  fitted  into  the  picture.  Thus  far,  there  seem  to  be  no 
serious  conflicts,  although  there  are  a  number  of  missing  links.  The  considera¬ 
tions  stated  above  do  suggest  several  plausible  hypotheses  for  experimental 
investigation. 


A  procedure  is  now  being  investigated  by  which  an  error  correction  is  applied 
to  a  randomly  chosen  set  of  R -units,  the  value  increments  being  transient 
rather  than  permanent,  unless  the  correction  actually  proves  effective.  It  is 
hoped  that  this  technique  will  yield  an  efficient  reinforcement  mechanism  which 
does  not  depend  on  specification  of  the  erroneous  R-units,  (see  Section  26.4) 
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26.  4  Mechanisms  of  Motivation 

The  problem  of  miotivation  for  perceptrons,  considered  as  models 
for  biological  nervous  systems,  has  hardly  been  treated  adequately  up  to  this 
time.  The  reinforcement  control  system,  which  forms  part  of  the  experimental 
system,  plays  the  role  of  a  sort  of  deus  ex  machina,  which  not  only  has  know¬ 
ledge  of  right  and  wrong  responses,  but  can  control  the  distribution  of  re¬ 
inforcement  to  individual  R-units  in  the  perceptron,  as  required,  A  more 
"natural"  system  with  only  a  slight  reduction  of  efficiency  does  seem  to  be 
possible,  hov.'ever,  although  at  present  the  model  proposed  is  a  heuristic  one, 
on  which  no  quantitative  analysis  has  been  completed. 

The  proposed  model  for  biological  reinforcement  mechanisms  is 
illustrated  in  Figure  72.  In  this  system,  the  r.  c.  s.  is  no  longer  external  to 
the  system,  but  is  essentially  part  of  the  perceptron.  It  is  assumed  that  the 
perceptron  system  includes  a  sensing  device  for  a  physiological  condition 
which  has  been  arbitrarily  called  the  "discomfort  level",  measured  by  the  vari¬ 
able  D.  This  might  be  compared  to  Ashby's  concept  of  "essential  variables". 

In  addition  to  continuously  measuring  the  variable  D,  which  is  assumed  for 
simplicity  to  be  some  function  of  the  current  stimulus  pattern,  a  second 
mechanism  (readily  represented  by  a  neuron  with  inhibitory  input  connections 
with  a  short  time  delay  and  excitatory  connections  with  a  longer  time  delay, 
both  originating  from  the  "  D- detector" )  responds  to  a  negative  dD/dt.  The 
corrections  to  this  system  are  random  perturbations  applied  either  to  active 
connections,  or  to  all  connections  of  tlie  perceptron;  the  increments,  however, 
take  the  form  of  "elastic  perturbation^',  so  that  t!ie  connections  tend  to  decay 
back  to  their  previous  values  unless  a  "positive  reinforcement"  occurs  to  "fix" 
the  new  values.  Thus  negative  reinforcement  applies  a  slight  random  perturba¬ 
tion,  which  tends  to  disappear  unless  it  actually  proves  helpful,  in  which  case  it 
is  stabilized  by  a  positive  reinforcement. 
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Figure  72  EXPERIMENTAL  SYSTEM  EMPLOYING  ELASTIC  PERTURBATIONS,  STABILIZED 
BY  IMPROVEMENTS  IN  SENSORY  SITUATION  (COMPARE  Figure  >1). 

For  this  system  to  function  efficiently,  it  is  again  necessary  to 
assume  some  degree  of  temporal  continuity  in  the  environment,  so  that  the 
change  in  D  indicates  a  true  improvement  in  the  response  of  the  system,  rather 
than  an  irrelevant  change  due  to  a  sudden  alternation  of  the  environment. 
Preliminary  simulation  experiments  to  evaluate  this  scheme  are  now'in  progress, 
employing  the  Burroughs  220  computer,  and  indicate  that  the  system  should  work 
with  a  reasonable  degree  of  efficiency,  as  compared  to  a  system  employing  a., 
more  deterministic  error  correction  procedure.  The  results  of  these  experi¬ 
ments  v/ill  be  reported  as  soon  as  the  data  are  complete.  The  system  has  the 
advantage  that  it  works  well  with  an  arbitrarily  large  number  of  R-units, 
without  requiring  an  individual  decision  as  to  the  error  of  each  one,  as  long  as 
D  is  some  monotone  increasing  function  of  the  joint  error,  such  as  the  norm  of 
the  difference  vector,  i/  -'-'il.  Such  a  representation  will  work  best  when  all 
of  the  R-units  are  continuous  transducer  units,  so  that  any  random  value - 
perturbation  will  have  a  0.  5  probability  of  yielding  an  improvement. 
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27.  CONCLUSIONS  AND  FUTURE  DIRECTIONS 

Man's  intelligence  is  a  unique  phenomenon  on  our  planet,  occurring 
at  such  a  level  of  complexity  in  a  single  species  only.  The  lack  of  other 
similarly  intelligent  species  is  unfortunate  from  the  standpoint  of  science, 
for  it  makes  it  difficult  to  tell  from  comparative  evidence  which  features  of 
human  psychology  are  accidental  products  of  man's  peculiar  biological  constitu¬ 
tion,  and  which  are  fundamental  to  the  nature  of  intelligence  itself.  Despite 
this  lack  of  comparative  material,  some  of  us  believe  that  it  may  ultimately  be 
possible  to  answer  such  questions  through  an  understanding  of  the  physical  basis 
of  psychological  phenomena,  independently  of  the  biology  of  any  one  species.  The 
perceptron  program  represents  a  small  part  of  such  an  undertaking;  it  is  an 
attempt  to  study  the  psychological  properties  of  certain  highly  simplified  mathe¬ 
matical  or  physical  m.odels  of  the  central  nervous  system,  in  the  hope  that  such 
a  study  may  throw  light  on  basic  principles  which  can  then  be  applied  to  more 
sophisticated  models. 

The  use  of  "models"  to  represent  complicated  natural  phenomena 
has  been  an  essentfal  technique  in  the  physical  sciences  for  many  centuries. 

The  model  is  a  simplified  theoretical  system,  which  purports  to  represent  the 
laws  and  relationships  which  hold  in  the  real  physical  universe.  The  solar 
systems  of  Ptolemy,  Copernicus,  and  Einstein,  and  the  Atomic  models  of 
Democritus,  Bohr,  and  Heisenberg  represent  two  successions  of  such  models, 
each  in  turn  coming  somewhat  closer  to  an  adequate  representation  of  its  subject 
matter.  In  some  cases  (the  concept  of  an  "ideal  gas"  for  example)  the  model 
deliberately  neglects  certain  complicating  features  of  the  natural  phenomena 
under  consideration,  in  order  to  obtain  a  more  readily  analyzed  system,  which 
will  suggest  basic  principles  that  might  be  missed  among  the  complexities  of 
a  more  accurate  representation.  Such  simplified  models  may  then  be  refined 
through  a  series  of  " perturbations" ,  which  introduce  the  known  complications 
one  at  a  time,  in  a  manner  which  permits  the  mathematician  to  incorporate  them 
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into  his  analysis.  It  is  this  approach  which  has  been  most  characteristic  of  the 
perceptron  program. 

Stated  in  simplest  terms,  our  objective  has  been  to  discover  a 
physical  system,  or  abstract  model,  which  will  be  capable  of  "perceiving”  its 
environment,  and  learning  to  recognize  those  objects  or  events  which  it  has 
perceived  in  the  past.  However,  since  it  is  our  purpose  to  understand  the  actual 
mechanisms  employed  by  the  brain,  rather  than  simply  to  construct  a  new  type 
of  computing  device,  the  perceptron  models  are  constrained  in  their  organization 
and  dynamic  properties  by  what  is  known  of  the  biological  nervous  system.  Rather 
than  attempting  to  "invent"  .or  "construct"  a  machine  which  will  calculate  such 
things  as  similarities  or  geometrical  properties  of  stimuli,  the  approach  has 
been  to  begin  with  a  hypothetical  network  of  idealized  neurons,  or  nerve  cells, 
resembling  the  brain  in  its  general  organization,  and  then  analyze  the  system 
mathematically  to  determine  whether  or  not  it  possesses  "psychological" 
properties  of  interest.  Where  the  model  is  found  to  deviate  markedly  from  the 
behavior  of  biological  systems,  modifications  are  suggested,  and  the  new  model 
that  results  is  subjected  to  the  same  sort  of  analysis.  In  this  fashion,  it  is  hoped 
that  the  necessary  conditions  for  a  system  to  "perceive"  in  the  same  manner  as 
the  brain  can  be  abstracted. 

In  this  chapter,  we  will  attempt  to  summarize  the  principle  results 
which  have  thus  far  emerged  from  this  approach,  the  problems  which  have  now 
come  to  the  foreground,  and  the  means  by  which  these  problems  might  be  attacked. 
The  possible  applications  of  perceptron  theory  to  engineering  devices  and  the 
construction  of  physical  brain  models  will  also  be  considered.  Finally,  an  attempt 
will  be  made  to  anticipate  the  future  re  tionship  of  the  neurodynamic  approach  to 
the  various  alternative  strategies  by  which  the  problems  of  understanding  and 
simulating  intelligence  are  being  investigated. 
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27.  1  Psychological  Properties  in  Neurodynamic  Systems 


Our  main  conclusions  deal  with  the  properties)  of  closed  experimental 
systems,  such  as  those  illustrated  in  Figures  3,  4,  and  72.  It  has  been  shown 
that  as  the  topological  organization  of  the  perceptron  increases  in  complexity, 
new  psychological  properties  emerge.  The  principle  results  can  be  summarized 
as  follows: 

(1)  A  network  consisting  of  less  than  three  layers  of  signal  transmission 
units,  or  a  network  consisting  exclusively  of  linear  elements  connect¬ 
ed  in  series,  is  incapable  of  learning  to  discriminate  classes  of 
patterns  in  an  isotropic  environment  (where  any  pattern  can  occur 

in  all  possible  retinal  locations,  without  boundary  effects). 

(2)  A  three-layer  se ries -coupled  perceptron  is  a  minimal  system  capable 
of  learning  to  discriminate  arbitrary  classes  of  stimulus  patterns 

or  stimulus  sequences.  Any  discrimination  problem  can,  in  princi¬ 
ple,  be  solved  by  such  a  system,  and  any  arbitrary  response  function 
can  be  assigned  to  the  stimuli  of  a  given  universe. 

(3)  By  means  of  an  a  -system  error-correction  procedure,  a  three- 
layer  series -coupled  perceptron  with  simple  A-units  and  a  fixed 
preterminal  network  can  always  be  taught  the  solution  to  any  classi¬ 
fication  problem  or  response  function  for  which  a  solution  exists. 

(4)  Equations  for  the  learning  curves  of  simple  perceptrons  under 
various  reinforcement  rulos  have  been  presented.  The  results 
•indicate  that  for  simple  tasks,  such  as  the  recognition  of  large 
alphabetic  characters  against  a  plain  background,  the  three-layer 
series -coupled  system  performs  with  reasonable  efficiency, 
although  it  may  require  a  lengthy  training  procedure  with  large 
samples  of  each  stimulus  class  to  guarantee  recognition  of  all 
variations,  or  "allomorphs"  of  a  pattern. 
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(5)  In  perceptrons  with  variable -valued  preterminal  networks,  a  non- 
deterministic  reinforcement  rule  may  be  required  to  guarantee  that 
the  solution  to  a  classification  problem  will  be  achieved,  given  that 
the  solution  exists. 

(6)  Generalization  capabilities  of  three-layer  series -coupled  systems 
are  poor,  and  in  "pure  generalization"  experiments  (where  the  test 
stimuli  have  no  sensory  points  in  common  with  the  training  stimuli) 
there  is  essentially  no  generalization  capability. 

(7)  Series -coupled  perceptrons  with  randomly  organized  origin-point 
configurations  for  the  A-units  tend  to  be  highly  resistant  to  stimulus 
noise  and  network  damage;  in  a  complex  field  containing  mixtures  of 
familiar  stimuli,  however,  they  are  easily  confused,  and  are  incapable 
of  responding  selectively  to  one  stimulus  or  object  at  a  time. 

(8)  The  addition  of  a  fourth  layer  of  signal  transmission  units,  or 
cross -coupling  the  A-units  of  a  three-layer  perceptron,  permits 
the  solution  of  generalization  problems,  over  arbitrary  transform¬ 
ation  groups . 

(9)  Four -layer  and  cross -coupled  systems  with  suitable  rules  for 
modifying  their  connection  values  (Chapters  16,  17,  and  19)  are 
capable  of  learning  a  group  of  transformations  which  have  occurred 
commonly  in  sequences  of  stimuli,  and  later  recognizing  the 
similarity  of  stimuli  which  are  equivalent  under  the  observed 
transformation  group.  This  phenomenon  occurs  "spontaneously", 
without  any  external  influence  on  the  perceptron  apart  from  the 
occurrence  of  stimuli. 

(10)  In  back-coupled  perceptrons,  selective  attention  to  familiar  objects 

in  a  complex  field  can  occur.  It  is  also  possible  for  such  a  perceptron 
to  attend  selectively  to  objects  which  move  differentially  relative  to 
their  background. 
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(11)  By  a  suitable  combination  of  geometric  constraints  (Chapter  23) 
a  multi-layer  perceptron  can  be  enabled  to  recognize  detailed 
patterns  in  high-resolution  fields  with  markedly  increased  efficiency, 
compared  to  a  randomly  organized  three -layer  system.  For  a  given 
universe  of  stimuli,  there  will  be  an  optimum  organization  of  such 

a  system,  which  will  rarely  exceed  three  layers  of  A-units  for 
tasks  commensurate  with  human  capabilities  under  tachistoscopic 
conditions . 

(12)  A  number  of  speculative  models  which  are  likely  to  be  capable  of 
learning  sequential  programs,  anailysis  of  speech  into  phonemes, 
and  learning  substantive  "meanings"  for  nouns  and  verbs  with 
simple  sensory  referents  have  been  presented  in  the  preceding 
chapters.  Such  systems  represent  the  upper  limits  of  abstract 
behavior  in  perceptrons  considered  to  date.  They  are  handicapped 
by  a  lack  of  a  satisfactory  "temporary  memory",  by  an  inability. to 
perceive  abstract  topological  relations  in  a  simple  fashion,  and  by 
an  inability  to  isolate  meaningful  figural  entities,  or  objects, 
except  under  special  conditions. 

The  capabilities  which  are  outlined  above,  and  the  variety  of  networks 
and  dynamic  principles  considered,  map  out  a  substantial  territory,  much  of 
which  still  remains  to  be  explored  in  detail.  While  rudimentary  perceptual 
behavior  appears  to  be  present  in  these  systems,  it  seems  likely  that  to  deal 
adequately  with  the  problems  of  complex  perceptual  fields  and  the  recognition 
of  abstract  relations  between  objects  or  events,  additional  principles  must 
still  be  found. 

27.  2  Strategy  and  Methodology  for  Future  Study 

A  number  of  perceptrons  analyzed  in  the  preceding  chapters  have 
been  analyzed  in  a  purely  formal  way,  yielding  equations  which  are  not  readily 
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translated  into  numbers.  This  is  particularly  true  in  the  case  of  the  four -layer 
and  cross -coupled  systems,  where  the  generality  of  the  equations  is  reflected 
in  the  obscurity  of  their  implications,  except  for  the  few  cases  where  explicit  ex¬ 
amples  have  b.een'worked  out.  For>other  mode'l,s ,  only  ■qu'alitative  results  are 
available,  although  the  way  is  clear  for  quantitative  work  to  be  initiated.  Those 
problems  which  appear  to  be  foremost  at  this  time  include  the  following: 

(1)  Theoretical  learning  curves  for  the  error  correction  procedure:. 

(At  present,  only  empirical  results  are  available,  and  no 
attempts  at  theoretical  analysis  have  proven  successful.  ) 

(Z)  Determination  of  the  probability  that  a  solution  exists  to  a 

given  problem,  for  a  perceptron  drawn  from  a  specified  class, 

(3)  The  development  of  optimum  codes  for  the  representation 

^  of  complex  environments,  in  perceptrons  with  multiple  R- 

^  units  (see  Section  12.  2). 

(4)  Development  of  an  efficient  reinforcement  scheme  for  pre¬ 
terminal  connections  (c.  f.  ,  Chapter  13). 

(5)  Optimum  organization  of  stimulus  analyzing  mechanisms  and 
networks  with  geometrically  constrained  connections  (c.f., 

Chapter  23). 

(6)  Terminal  performance  of  cross -coupled  and  four-layer  percep¬ 
trons  in  generalization  experiments,  as  a  function  of  network 
parameters,  reinforcement  dynamics,  and  environment 
characteristics . 

(7)  Theoretical  analysis  of  convergence -time  and  learning  curves 
for  adaptive  four-layer  and  cross -coupled  perceptrons. 

(8)  Quantitative  studies  of  effects  of  threshold  servos  on  system 
performance  (c.f.,  Chapter  21). 
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(9)  Quantitative  studies  of  speech  recognition  and  phoneme  analyzing 
systems . 

(10)  Performance  of  back-coupled  systems  in  selective  attention  and 
detection  experiments. 

(11)  Quantitative  studies  of  sequential  program  learning  in  back- 
coupled  systems. 

(12)  Effect  of  spatial  constraints  in  cross -coupled  systems  (e.g.  , 
limiting  interconnections  to  pairs  of  A-units  with  adjacent 
retinal  fields). 

(13)  Studies  of  possible  figure-segregation  (figure -ground)  mechanisms. 

(14)  Studies  of  abstract  concept  formation,  and  the  recognition  of 
topological  or  metrical  relations. 

I  ^ 

(15)  Biological  memory  mechanisms,  and  studies  of  neurophysiology 
in  relation  to  perceptron  theory. 

Four  basic  techniques  are  available  for  the  study  of  these  problems: 
theoretical  analysis,  digital  simulation,  the  construction  of  physical  models, 
and  physiological  experimentation.  The  first  two  problems  of  the  above  list 

are  specifically  mathematical  in  character.  Tijir-  l^-.ird,  while  posed  as  a 

'•  <'>'■' 

.... 

theoretical  question,  might  best  be  investigated' at  the  outset  by  means  of  simu¬ 
lation  studies.  In  the  case  of  problems  (4)  and  (5),  simulation  studies  seem  to 
be  indicated  for  preliminary  exploration,  although  it  is  hoped  that  some  theore¬ 
tical  formulations  may  ultimately  be  achieved.  The  sixth  problem  --  the 
determination  of  terminal  performance  of  adaptive  four-layer  and  cross -coupled 
systems  --  calls  in  effect  for  a  variety  of  explicit  solutions  to  the  steady-state 
equations  presented  in  Part  III.  Such  a  program  is  currently  being  carried  out 
both  by  direct  computation  of  the  equations  and  by  simulation  techniques.  For 
the  cros s -coupled  systems,  simulation  is  likely  to  prove  more  economical  in 
most  cases  than  the  numerical  solution  of  the  equations.  The  seventh  question 
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again  is  a  theoretical  one,  although  preliminary  results  obtained  from  simulation 
programs  should  prove  enlightening.  The  problem  of  threshold  servomechanisms 
can  be  investigated  both  by  theoretical  means  and  by  simulation. 

It  has  recently  been  proposed  that  an  audio-perceptron  should  be 
constructed  at  Cornell  University  to  study  the  problem  of  speech  recognition. 

Since  this  is  a  problem  in  which  the  chief  interest  is  in  performance  under 
typical  environmental  conditions,  rather  than  in  theoretical  problems  of  pattern 
recognition  (which  have  all  been  solved  on  paper,  insofar  as  spoken  inputs 
resemble  any  other  form  of  sensory  sequences),  it  seems  best  to  provide  for 
convenient  input  to  a  real-time  system,  rather  than  working  with  simulated 
perceptrons  and  samples  of  digitalized  speech.  The  problem  of  phoneme  analysis, 
however,  still  presents  enough  theoretical  problems  and  uncertainty  as  to  the  best 
solution,  so  that  a  digital  simulation  program  is  indicated.  The  system  proposed 
in  Chapter  23  is  now  being  investigated  by  this  means.  The  problems  of  back- 
coupled  systems  referred  to  in  (10)  are  probably  also  best  referred  to  an  actual 
physical  model,  although  a  certain  amount  of  useful  simulation  can  be  performed 
in  checking  out  the  general  theory  before  such  a  model  is  built.  Problem  (11) 
is  also  of  this  character.  Problem  (12)  is  again  of  the  type  which  will  yield  most 
readily  to  simulation  at  this  time.  It  is  of  interest  in  connection  with  possible 
figure  “ground  mechanisms,  which  are  included  in  a  more  general  way  in  Prob¬ 
lem  (13). 


Problems  (13)  and  (14)  are  primarily  speculative  in  character,  and 
must  await  new  insight  into  possible  mechanisms,  the  exact  nature  of  which  is 
not  yet  clear.  It  is  hoped  that  studies  of  the  other  problems,  which  are  all  well 
enough  formulated  to  be  investigated  directly,  will  suggest  possible  approaches 
to  these  two  problems,  which  represent  the  most  baffling  impediments  to  the 
advance  of  perceptron  theory  in  the  direction  of  abstract  thinking  and  concept 
formation.  The  previous  questions  are  ail  in  the  nature  of  "mopping -up"  oper- 
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ations  in  areas  where  some  degree  of  performance  is  known  to  be  possible,  and 
where  suitable  mechanisms  can  be  described,  at  least  in  qualitative  terms;  the 
problems  of  figure -ground  separation  (or  the  recognition  of  unity)  and  topological 
relation  recognition  represent  new  territory,  against  which  few  inroads  have  been 
made. 


The  last  problem  --  the  correlation  of  perceptron  theory  with 
biological  evidence  --  represents  at  once  an  area  of  investigation  in  its  own 
right,  and  a  potential  source  of  insights  into  solutions  to  the  prior  problems. 

To  date,  little  has  been  done  to  obtain  relevant  physiological  data  directly. 
Nonetheless,  several  hypotheses  have  been  suggested  (c.f.  ,  Chapter  26),  and 
a  great  deal  of  useful  work  along  the  line  of  Hubei's  studies  of  the  cat  cortex 
can  be  carried  out  using  known  laboratory  techniques. 

27.3  Construction  of  Physical  Models  and  Engineering  Applications 

From  a  purely  scientific  standpoint,  physical  models  of  particular 
perceptron  organizations  seem  to  be  indicated  only  for  relatively  advanced 
systems  (such  as  the  speech  recognition,  selective  attention,  and  program 
learning  perceptrons  referred  to  above)  where  the  theory  is  reasonably  well 
known,  but  the  actual  quantitative  behavior  under  realistic  environmental 
conditions  remains  in  doubt.  In  some  cases,  it  may  ultimately  prove  more 
economical  to  build  a  physical  model  than  to  simulate  a  highly  parallel  signal 
network  on  a  sequential  computer.  Digital  simulation,  however,  always  has 
the  advantage  of  greater  versatility  and  adaptability  to  radical  changes  in  design 
and  dynamics  of  the  simulated  network.  Its  main  difficulties  are  insufficient 
speed,  insufficient  high-speed  memory,  and  difficulty  of  programming  the 
simulation  of  complicated  "naturalistic"  environments  required  for  some  exeri- 
ments.  This  last  disadvantage  can  be  overcome  by  the  design  of  special  sensory 
input  devices  (such  as  audio  analyzers  and  flying-spot  scanners)  for  digital 
computers,  and  it  is  hoped  that  such  equipment  will  be  available  in  the  near 
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future.  While  most  problems  can  be  investigated  successfully  in  scaled-down 
versions  using  a  computer  comparable  to  the  IBM  704  or  7090,  a  problem 
occasionally  occurs  which  places  a  severe  strain  on  the  capability  of  even  the 
best  digital  equipment  now  available.  The  study  of  evolutionary  models,  and 
adaptation  processes  in  cross -coupled  systems  appear  to  be  of  this  variety. 

A  special  purpose  digital  computer  (such  as  the  Mark  II  design  proposed  by 
C.  A.  L.  )  may  ultimately  prove  to  be  the  most  expedient  solution  to  these 
problems,  although  the  limits  of  useful  simulation  with  conventional  computers 
have  not  yet  been  reached. 

The  construction  of  physical  perceptron  models  of  significant  size 
and  complexity  is  currently  limited  by  two  technological  problems:  the  design 
of  a  cheap,  mass -produceable  integrator,  and  the  development  of  an  inexpensive 
means  of  wiring  large  networks  of  components.  The  Mark  I  (Frontispiece) 
employs  motor -driven  potentiometers  for  integrators,  and  a  large  patch-panel 
for  connections  -  both  intolerable  solutions  for  very  large  systems.  The 
integrator  problem  is  currently  being  attacked  by  groups  at  Aeronutronic 
and  Stanford  Research  Institute,  who  have  developed  magnetic  integrators  which 
are  suitable  for  alpha-system  perceptrons,  and  at  Cornell  University,  where  an 
electrochernical  system  is  under  investigation.  While  these  approaches  seem  to 
offer  some  hope  of  an  "intermediate"  solution  to  the  problem,  an  ultimate 
solution  is  more  likely  to  come  from  some  of  the  solid  state  work  and  studies 
of  microelectronics,  such  as  the  work  of  Shoulders  at  SRI  (Reference  114), 

This  last  technique  offers  a  potential  solution  to  the  interconnection  problem, 
as  well  as  a  possible  means  of  fabricating  large  numbers  of  digital  integrators 
at  low  cost. 


Since  the  main  emphasis  in  this  volume  has  been  on  neurodynamic 
theory,  rather  than  applications,  little  has  been  said  about  the  engineering  aspects 
of  the  field.  It  is  clear  that  if  the  objective  of  a  coherent  theory  of  brain  mechanisms 
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is  achieved,  it  i  s  likely  to  prove  applicable  to  pattern  recognition  and  control 
devices,  as  well  as  the  development  of  advanced  computing  systems  of  many 
varieties.  Preliminary  studies  have  been  carried  out  dealing  with  possible 
applications  of  perceptrons  to  photo-interpretation  (Reference  116)  and  the 
recognition  of  events  in  bubble  chambers  (Reference  115).  More  abstract 
applications  of  the  pattern  recognition  ability,  such  as  the  diagnosis  of  clinical 
syndromes  or  meteorological  prediction,  have  occasionally  been  proposed, 
although  little  evidence  has  been  accumulated  regarding  the  relative  suitability 
of  perceptrons  as  opposed  to  more  conventional  techniques  for  dealing  with  such 
problems.  The  applications  most  likely  to  be  realizeable  with  the  kinds  of 
perceptrons  described  in  this  volume  include  character  recognition  and  "reading 
machinej'',  speech  recognition  (for  distinct,  clearly  separated  words),  and 
extremely  limited  capabilities  for  pictorial  recognition,  or  the  recognition  of 
objects  against  simple  backgrounds.  "Perception"  in  a  broader  sense  may  be 
potentially  within  the  grasp  of  the  descendants  of  our  present  models,  but  a 
great  deal  of  fundamental  knowledge  must  be  obtained  before  a  sufficiently 
sophisticated  design  can  be  prescribed  to  permit  a  perceptron  to  compete  with 
a  man  under  normal  environmental  conditions. 

The  most  important  technological  development  which  may  be  inherent 
in  the  future  development  of  brain  models,  would  be  the  provision  of  "eyes  and 
ears"  for  conventional  computers  and  automata,  giving  them  a  common  universe 
of  discourse  with  their  operators.  Current  attempts  at  heuristic  problem-solving 
programs  (such  as  Newell  and  Simon's  programs)  and  at  automatic  language 
translation,  are  hampered  by  a  lack  of  common  referents  for  symbols,  which 
can  be  no  more  than  code-numbers  for  the  computer,  but  which  have  a  wealth 
of  associated  meanings  for  the  operator.  The  development  of  a  system  which,  by 
virtue  of  shared  sensory  experience,  can  "comprehend"  the  nature  of  the  physical 
referents  in  a  descriptive  statement,  is  probably  a  necessary  first  step  to  the 
creation  of  a  truly  useful  problem-solving  computer.  Linguistic  capability,  related 
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to  perceptual  experience,  is  of  the  essence  for  an  "intelligent"  system,  artificial 
or  otherwise. 

Z7.  4  Concluding  Remarks 

The  last  four  years  have  seen  the  development  of  perceptron  theory 
from  the  study  of  a  few  primitive  models  to  the  mapping  of  a  comprehensive 
field  of  investigation.  In  its  present  form,  this  theory  is  definitive  only  in  its 
treatment  of  relatively  simple  systems,  although  a  considerable  number  of  more 
advanced  systems  are  now  understood  at  least  in  a  qualitative  fashion,  and  the 
way  is  now  open  to  quantitative  studies  of  well-defined  problems. 

As  advanced  perceptron  models  become  more  sophisticated  in  their 
psychological  properties,  it  becomes  more  appropriate  to  consider  them  as 
devices  capable  of  performing  arbitrary  programs  of  observation,  response,  and 
manipulation  of  data.  As  this  condition  is  reached,  the  methodology  of  perceptron 
studies  is  likely  to  merge  with  that  of  the  "heuristic  program"  approach  to 
psychological  functioning,  advocated  by  Newell  and  Simon  (Reference  6Z).  In 
such  programs,  goal -motivated  behavior  becomes  the  main  object  of  study, 
whereas  in  perceptrons  studied  to  date,  the  behavior  is  motivated  primarily  by 
the  present  environment  and  state  of  the  system, >  A  merger  of  thesc  app'ro'aches'  will 
not  only  open  up  new  territory,  but  will  be  a  sign  of  the  "psychological  maturity" 
of  perceptron  theory,  inasmuch  as  it  will  permit  the  study  of  non-trivial  prob¬ 
lems  in  the  psychology  of  thinking  and  problem-solving,  in  terms  of  neurodynamic 
systems  of  known  physical  structure. 

On  the  other  hand,  the  "biological  maturity"  of  neurodynamic  theory 
must  await  the  solution,  or  at  least  a  more  promising  approach,  to  the  biological 
memory  problem.  Once  this  is  achieved,  a  fruitful  interaction  between  percep¬ 
tron  theory  and  neurophysiology  can  be  expected;  but  the  memory  problem  remains 
paramount  in  importance. 
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The  theoretical  approach  presented  in  this  volume  is  clearly  a  long 
way  from  an  adequate  "explanation"  of  the  foundations  of  human  experience.  The 
work  will  have  fulfilled  an  important  purpose,  however,  if  it  has  succeeded  in 
conveying  a  recognition  of  the  potential  power  of  a  mathematical  study  of  neuro¬ 
dynamic  systems,  not  only  for  understanding  the  physical  mechanisms  of  the 
brain  itself,  but  for  comprehending  the  relationship  of  the  cognitive  process  in 
man  to  the  nature  of  the  enviromnent  in  which  it  occurs. 
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APPENDICES 


APPENDIX  A 


NOTATION  AND  STANDARD  SYMBOLS 

I.  Notational  Conventions 

While  the  mathematical  notation  employed  in  this  volume  may  still  be 
capable  of  further  improvement,  several  conventions  have  been  established  which 
appear  to  work  reasonably  well.  They  include  the  following: 

(1)  Individual  signal -units  in  the  perceptron  are  referred  to  by  a  lower 

case  letter  to  indicate  the  type,  and  a  subscript  to  designate  the 
particular  unit  in  question  {a-  -  ,'*^A-unit).  Individual  stimuli  are 
referred  to  by  a  subscripted  capital  while  stimulus  sequences 

are  designated  by  script  capitals 

(2)  Numbers  of  signal  units  are  designated  by  a  capital  A/  ,  with  a 
subscript  to  indicate  the  type  of  unit  in  question  =  number  of  A- 
units).  The  number  of  stimuli  is  indicated  by  a  small  /; . 

(3) .  An  asterisk  is  used  to  denote  activity;  ’ '7^-*'  =  activity  state  (or 

output  signal)  of  the  unit  /•;  /J  '  =  number  of  active  A-units; 
c^j  =  signal  transmitted  by  connection  C;j  . 

(4)  Sets  of  units  may  be  designated  either  by  a  subscripted  capital  or 
by  a  functional  notation.  For  example,  the  set  of  A-units  respond¬ 
ing  to  stimulus  ^  ■  may  be  designated  either  by /I;  or  by 

(5)  Where  it  is  necessary  to  refer  both  to  the  unit  receiving  a  signal 

and  to  the  stimulus  for  which  the  signal  occurs,  a  tensor  notation 
is  employed,  with  the  signal  unit  indicated  by  a  subscript  and  the 
stimulus  by  a  superscript.  For  example,  =  input  signal  to  the 

unit  from  the  ^ '  stimulus  at  time  t  .  An  obvious  extension 
would  permit  this  notation  to  be  applied  to  origins  as  well  as 
termini  of  signals;  thus  (■  >()  would  designate  the  signal  trans- 
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mitted  to  unit  J  from  unit  i  in  response  to  stimulus  3 ^  at 
time  t  . 

(6)  Whenever  pairs  of  subscripts  are  used  to  designate  a  signal  or 
connection  (as  in  )  the  first  subscript  indicates  the  origin, 
and  the  second  the  terminus.  In  generalization  coefficients 
[y-j),  the  first  subscript  indicates  the  "recipient"  and  the 
second  subscript  indicates  the  "source"  stimulus. 

(7)  In  multi-layer  systems,  the  layers  are  counted  separately  for 

each  type  of  unit,  and  the  number  of  the  layer  may  be  denoted 
by  a  superscript  in  parentheses  (e.  g.  ,  -  number  of  units 

in  the  second  association  layer;  ^  R-unit  of  the  third 

R-unit  layer). 

Matrix  and  vector  notations,  where  employed,  follow  usual  conventions, 
the  particular  symbols  being  defined  in  the  text  where  they  appear.  The  symbol 
n'  ,  v/hen  it  appears  without  subscripts,  indicates  a  decay  rate,  and  should  not  be 
confused  with  Kroneker's  delta,  which  appears  only-with  subscripts  ),  or  with 
Dirac  delta-functions,  d'  r\  for  which  the  functional  notation  is  always  used. 

2.  Standard  Symbols 

The  following  list  includes  those  symbols  which  are  used  consistently 
throughout  the  text.  A  number  of  additional  symbols  are  occasionally  employed 
for  convenience  in  particular  expositions,  and  are  defined  where  they  occur. 

th 

/7.-  =  generic  symbol  for  the  i  signal-unit  of  a  perceptron,  or,  in 

simple  perceptrons,  signal  to  the  R-unit  from  the  stimulus. 

.th 

7);  =  L  sensory  unit 

association  unit 
.th 

r-  -  i  response  unit 
jC-j  -  connection  from  unit  t  to  unit  J  . 
xij  =  output  signal  from  a- . 
a  I  =  output  signal  from  a-. 
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r-  -  output  signal  from  , 

7r^l  -  sequence  of  response  states  occurring  as  outputs  of  a  perceptron. 

jC*j  -  signal  transmitted  to  unit  j  from  unit  I  ,  on  connection  C;j 
(measured  at  point  of  arrival  at  the  terminal  unit). 

T- ■  -  transmission  time  of  connection  c-  • 

■j  •■j 

v-j  -  value  of  connection  c-j  (occasionally  abbreviated  to  tt-  in  simple 
perceptrons,  indicating  the  value  of  the  connection  from  ai  to 
the  R-unit). 

Vj  =  number  of  S -units 
y.,  =  number  of  A -units 
N f.  -  number  of  R -units 

=  total  input  signal  to  the  unit.  The  signal  due  to  stimulus  Sj 
is  designated  either  by  't;  by)  or  by  o'"'  .  If  the  tensor  notation 
is  employed,  then  '>■;  designates  the  vector  of  signals  (oil ,  ,  Oi.-), 

Similarly,  may  be  used  to  designate  the  vector  (rx( ,  a-i,..-,  <x^'  '. 

=  component  of  consisting  of  the  sum  of  all  signals  originating 

from  the  S  -units , 

/y  =  component  of  ;  consisting  of  the  sum  of  all  signals  originating 
from  the  A-units. 

(The  vectors  /i;  ,  ,  and  7'"'^  are  defined  analogously  to  the  correspond¬ 

ing  ^  vectors . ) 

,  r  '  1  =  functional  notation  for  activity  state  of  a  simple  A-unit.  0=1 
if  cx  ii' '9  ,  0  otherwise. 

=  number  of  excitatory  input -connectio.ns  to  an  A-unit 
(y  =  number  of  inhibitory  input -connections  to  an  A-unit 

th 

G  -  threshold  (specifically,  ''y  =  threshold  of  /  unit) 

:■  =  /  stimulus 

-  '  sequence  of  stimuli 

r  .  th 

c/;'  =  /  sequence  of  stimuli  up  to,  but  not  including,  the  terminal 

stimulus 

I':  -  normalized  retinal  area  (or  fraction  of  sensory  points)  covered  by  0- 

-  common  area  (retinal  intersection)  of  stimuli  S;  and  o; 

=  stimulus  world,  or  universe 

A/  =  number  of  stimuli  in  ;V 

N  -  number  of  admissible  stimulus  sequences,  consisting  of  stimuli 
in  'v 
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Cfl^)  =  classification  of  stimuli  in  M/  ,  into  two  or  more  equivalence 
classes. 

=  response  function,  assigning  possible  R-unit  states  to  each 
stimulus  in  fv 

,/Jj  -  sign  of  classification  of  stimulus  Sj  (+1  or  -1)  in  a  binary- 
classification,  C  (w) 

'll  -  increment  of  reinforcement  per  connection  (typically  ^1  or  0, 
in  quantized  systems) 

Q  -  decay  rate,  generally  applied  to  decaying  values,  but  occasionally 
used  in  connection  with  other  quantities  which  are  subject  to 
exponential  decay. 

i/i'j  “  generalization  coefficient;  the  change  in  the  signal  to  an  R-unit 
for  stimulus  5,  as  a  result  of  applying  a  unit  of  positive  re¬ 
inforcement  (  y  =  +1)  for  stimulus  S] 

G  -  matrix  of  generalization  coefficients,  g 

'g  •  -  probability  that  an  A-unit,  in  a  given  class  of  perceptrons, 

responds  to  stimulus  5,- 

.  ■  =  \probability  that  a  lif  layer  A-unit  responds  to  5: 

\  ■  I  h 

\  =  probability  that  an  A-unit  responds  to  the  P  stimulus  in 

sequence  J’ 

=  probability  that  an  A-unit  responds  both  to  j’  and  to  Jj 

th  ^  p 

C-  Cl-  =  probability  that  an  A-unit  responds  both  to  the  zx  stimulus  of 
and  to  the  i'  stimulus  of  rJj 

(The -probability  of  joint  response  for  an  arbitrary  number  of  stimuli,  ^  , 

is  similarly  defined.  When  it  is  understood  that  the  environment  consists  of 
stimulus  sequences,  as  in  discussions  of  cross -coupled  perceptrons,  the  sub¬ 
scripts  of  the  Q-functions  are  always  understood  to  refer  to  stimulus 
sequences,  rather  than  individual  stimuli.  ) 

-  mean  of  the  random  variable  / 

F.[xi  -  expected  value  of  7 

0'(x)  -  standard  deviation  of  ! 

P  =  _  probability,  particularly  probability  of  correct  performance  in 
a  given  experiment. 

=  notation  commonly  used  for  the  probability  that  the  random 
variable  x  has  the  value  jC  ;.  equivalent  to  Pix  = /') 
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T(5)  -  the  transform  obtained  by  applying  transformation  T  to 
stimulus  3 

t  -  time 

T  -  number  of  stimuli  (or  duration,  in  units  At  )  in  a  training 
sequence 

and  r  as  prefixes  indicate  types  of  reinforcement  systems, 
r.  c.  s.  =  reinforcement  control  system. 


APPENDIX  B 


LIST  OF  THEOREMS  AND  COROLLARIES 


This  appendix  contains  those  results  which  have  been  explicitly 
stated  in  the  form  of  theorems,  for  convenient  reference.  Theorems  are 
numbered  by  chapter  and  theorem  number,  in  the  order  in  v/hich  they 
originally  appear. 

THEOREM  5.  1;  Given  a  retina  with  two-state  (on  or  off)  input  signals, 
the  class  of  elementary  perceptrons  for  which  a  solution  exists  to 
every  classification,  '  '  ,  of  possible  environments,  ,  is  non¬ 

empty. 

THEOREM  5.  2:  Given  an  elementary  perceptron  and  a  classification 
C  \\0  ,  the  following  conditions  are  necessary  but  not  sufficient  for 
a  solution  to  n  to  exist: 

i)  every  stimulus  must  activate  at  least  one  A-unit; 

ii)  there  should  be  no  subset  of  stimuli  containing  at  least 
one  member  of  each  class,  such  that  in  the  union  of  the 
responding  A-unit  sets,  every  A-unit  has  the  same  bias 
ratio  (with  respect  to  the  stimuli  of  the  subset). 

THEOREM  5.  3:  Given  an  elementary  o:  -perceptron,  a  stimiulus  world 
IV  ,  and  any  classification  C(l^/)  :  then  in  order  for  a  solution  to  C(l^') 
to  exist,  it  is  necessary  and  sufficient  that  there  exist  some 
vector  a  in  the  same  orthant  as  CfivJ,  and  some  vector  i  such 
that  Gz  -  u  . 

COROLLARY  1:  Given  an  elementary  perceptron  and  a  stimulus  world 
ly  ,  then  if  G  is  singular,  some  exists  for  which  there  is  no 

solution. 

COROLLARY  2:  Given  an  elementary  perceptron,  if  the  number  of 
stimuli  in  V/  is  n  there  is  someC^lY^for  whichno  solution 

exists. 
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COROLLARY  3:  For  any  elementary  perceptron,  as  the  number  n  of 


stimuli  in  M/  increases,  the  probability  that  a  randomly  selected 
classification,  C(w),  has  a  solution  approaches  zero  (where  C(l//) 
is  chosen  from  a  uniform  distribution  over  the  possible  classifica¬ 
tions  of  M/  ). 

THEOREM  5,  4:  Given  an  elementary  (X  -perceptron,  a  stimulus  world 
iV  ,  and  any  classification  rfkvj  for  which  a  solution  exists;  let  all 
stimuli  in  IV  occur  in  any  sequence,  provided  that  each  stimulus 
must  reoccur  in  finite  time;  then  beginning  from  an  arbitrary 
initial  state,  an  error  correction  procedure  (quantized  or  non- 
quantized)  will  always  yield  a  solution  to  C(iVj  in  finite  time,  with  all 
signals  to  the  R~unit  having  magnitudes  at  least  equal  to  an  arbitrary 
quantity  d  ^  0  . 

COROLLARY:  Given  an  elementary  perceptron,  a  stimulus  world  W  , 
and  any  classification  ('  ftvJ;  then  if  a  solution  to  CftVj  exists,  the  set 
of  possible  solutions  to  C(iv)  has  positive  measure  over  the  phase 
space  of  the  perceptron. 

THEOREM  5,  5:  Given  an  elementary  cy -perceptron  with  a  finite  number 
of  memory  states,  a  random-sequence  stimulus  world  W  ,  and  any 
classification  C(w)  for  which  a  solution  can  be  reached  from  the 
starting  point  by  some  reinforcement  sequence,  then  a  solution 
will  be  obtained  in  finite  time  with  probability  1  by  means  of  a 
random-sign  correction  procedure. 

THEOREM  5.6:  Given  an  elementary  a: -perceptron,  a  stimulus  world 
W  ,  and  some  classification  C(wJ  for  which  a  solution  exists,  a 
solution  can  sometimes  be  achieved  by  an  S-controlled  reinforce¬ 
ment  procedure.  Hov-'ever,  such  a  solution  cannot  be  guaranteed 
for  an  arbitrary  stimulus  sequence,  and  may  be  unstable  if  it 
occurs. 
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THEOREM  5.  7:  Given  an  elementary  perceptron  with  a  finite  number  of 
memory  states,  a  stimulus  world  M/  ,  and  a  classification  C(w)  for 
which  a  solution  can  be  reached  from  the  starting  point  by  some 
reinforcement  sequence,  then  a  solution  can  always  be  obtained  in 
finite  time  by  means  of  a  random  perturbation  correction  procedure. 

THEOREM  5.  8;  Given  an  elementary  /-perceptron,  a  stimulus  world 
W  ,  and  a  classification  _  Tvy,  it  is  possible  that  a  solution  to  "fl'W 
exists  which  cannot  be  achieved  by  the  perceptron. 

THEOREM  5.9:  Given  an  r/ -perceptron,  and  a  classification  C(’(vl, 

a  necessary  and  sufficient  condition  that  the  error  correction 

procedure  reach  a  solution  (in  finite  time,  with  arbitrary  starting 

point)  is  that  there  exists  no  non-zero  vector  /  (whose  components 

# 

do  not  disagree  in  sign  with  )  such  that  h’ X  =  0  for  all  I 

(where  is  the  bias  number,  defined  as  in  Chapter  5). 

COROLLARY:  For  an  rx -system,  the  condition  that  there  exist  no  non- 
zero  vector  /'f  such  that  =  -  for  all  /’  is  equivalent  to  the 
condition  that  there  exist  7  and  U  such  that  GZ  -  H  (where  U  is 
in  the  same  orthant  as  C{i'V))  . 

THEOREM  5.  10:  Given  a  -perceptron,  and  a  classification  C(^v),  a 

necessary  and  sufficient  condition  that  the  error  correction  procedure 
reach  a  solution  (in  finite  time)  is  that  there  exists  no  non-zero 
such  that  /;■  i  =  -  for  all 

COROLLARY:  For  a  .^'-system,  the  condition  that  there  exist  no  non¬ 
zero  vector  /^^such  that  -  c  for  all  I  is  equivalent  to  the 

condition  that  there  exist  Y  and  7  such  that  ■  '  -  t)  (where  U  is 
in  the  same -orthant  as  C.{\N))  . 

THEOREM  7.  1:  Given  a  class  of  elementary  a' -perceptrons ,  a  finite 

stimulus  world  W  ,  a  classification  J-  n'-'  ,  and  a  training  sequence; 
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then  for  every  t  >0,  there  exists  an  such  that  if  , 

the  probability  of  selecting  a  perceptron  which  will  correctly 
identify  the  class  of  every  positive  stimulus  will  be  greater  than 

I  -e  . 

(see  Page  157  for  definition  of  positive  stimulus.  ) 


THEOREM  9.  1:  In  a  bounded  (X  -perceptron,  with  S -controlled  reinforce 
ment,  the  probability  distribution  TT{nr){lor  the  value  of  a  particular 
connection)  approaches  a  stable  terminal  distribution  of  the  form 

^  /I)  \  ^  , 

iT(o-)  -  where  xT  is  a  normalization  constant  equal  to 


/  -  (p/q  ^ 


THEOREM  10.  1;  Given  a  completely  linear  perceptron,  a  stimulus 
world  1//  ,  and  a  classification  C [w]  such  that  the  bias  ratio  of 
every  S-unit  is  equal  (and  non-zero)  no  solution  to  C{W)  can  exist. 


THEOREM  10,  2:  Given  a  simple  -perceptron  with  simple  A-units, 

an  R-unit  with  a  continuous  monotonic  sign-preserving  signal 
generating  function,  a  stimulus  world  (in  which  each  stimulus 
ultimately  reoccurs)  and  any  response  function  P(w)  for  which  a 
solution  exists,  then  by  means  of  the  error-corrective  reinforce¬ 
ment  procedure,  the  given  response  function  can  always  be 
approximated  in  finite  time  by  an  output  vector  R[\N)i  £  ,  where 
I-  is  a  vector  ( ,  ...  ^  ,.  ^  ),  |c  .  j  x  where  (.  may  be  an 

arbitrarily  small  quantity  greater  than  zero. 


LEMMA  1:  Given  a  symmetric  positive  definite  or  positive  semi- 
definite  matrix,  H  ,  and  any  vector  j  ,  then  =  0  only 

if  H  j  -  0  . 

LEMMA  2:  .For  the  same  conditions  as  Theorem  10.2,  given  that  a 
solution  exists,  the  set  of  all  solutions  forms  a  hyperplane  of 
dimension  equal  to  the  nullity  of  G  . 
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COROLLARY  1;  For  the  conditions  of  Theorem  10,  2,  and  a  phase  space 
which  is  unbounded  in  all  dimensions,  the  probability  of  convergence 
to  an  arbitrarily  close  approximation  to  RiW)  by  m-eans  of  a  random- 
sign  correction  procedure  or  a  random -perturbation  correction 
procedure  may  be  less  than  1, 

COROLLARY  2:  Given  the  conditions  of  Theorem  10,  2,  and  a  phase  space 
bounded  in  all  dimensions,  then  (given  that  a  solution  to  R{W)  exists 
in  this  bounded  space)  the  response  function  can  always  be  approximated 
by  means  of  the  random-sign  correction  procedure,  the  system  converg¬ 
ing  in  finite  time  to  an  approximation  PiW)+ £  ,  e  a  vector,  where 
I  j  <  £■  for  arbitrarily  small  €  >  0  . 

COROLLARY  3:  Given  the  same  conditions  as  Corollary  2,  the  response 
function  can  always  be  approximated  by  the  random-perturbation 
correction  procedure,  the  system  converging  in  finite  time  to  an 
approximation  RwNI+f.  ,  •"  having  components  of  magnitude  \^i\  ^  kl 
if  the  reinforcement  is  quantized,  or  .C;j  6  -  0,  if  P  is  chosen 

from  a  continuous  distribution  around  zero, 

THEOREM  10,  3:  Given  a  simple  perceptron  with  a  simple  R-unit,  and 
with  transmission  functions  for  all  A-R  connections  of  the  form 
■f(od;)Tr-^,  where  f  is  any  function,  and  given  the  existence  of  a 
solution  to  a  classification  function  (  iw)ior  this  perceptron,  then 
if  is  any  polynomial  of  odd  degree  in  u  ,  there  also  exists  a 

solution  if  the  transmission  function  is  changed  to  ^'((x.;)  p(ir-^)  . 

THEOREM  10,4;  Given  the  perceptron  of  Theorem  10,  3,  if  a  solution 
exists  for  some  transmission  function  a  solution  does  not 

necessarily  exist  for  the  transmission  function  ,  g  P  -F  . 

THEOREM  10,  5:  Given  a  simple  perceptron  with  A-R  connections  which 
differ  in  their  transmission  functions,  or  with  uniform  transmission 
functions  but  non-simple  A-units,  a  response  function  may 
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have  a  solution  which  is  unattainable  by  either  the  error  correction 
procedure  or  the  random -sign  correction  procedure. 

THEOREM  10,6:  Given  a  simple  perceptron  with  any  mixture  of  trans¬ 
mission  functions  connections  ^'jr  >  ^.nd  a  response 

function  for  which  a  solution  exists;  then  there  exists  some- 

transmission  function  q(cy.,'u)  which  is  uniform  for  all  connections, 
such  that  a  solution  to  l<>(w)  exists. 

THEOREM  10,  7:  Given  a  simple  perceptron  with  an  R-unit  which  is  either 
simple  or  has  a  continuous  signal  generating  function,  and  with  any 
combination  of  transmission  functions  from  its  A-units  (all  continu¬ 
ous  functions  of  ,  equal  to  zero  if  -0),  and  given  a  bounded 
phase  space  within  which  a  solution  exists  for  P(w)  then,  if  each 
stimulus  in  /V  ultimately  reoccurs,  an  approximate  solution  P(W)  +  i 
is  always  obtainable  in  finite  time  by  the  random -perturbation 
correction  procedure, 

THEOREM  12.  1:  Given  a  perceptron  with  more  than  one  R-unit,  and  a 
response  function  -v  or  a  classification  C'w)  for  which  a  solution 
exists,  it  may  be  impossible  to  achieve  this  solution  by  an  error 
correction  procedure  which  applies  negative  reinforcement  jointly 
to  all  R-units  based  on  errors  in  their  joint  response. 

THEOREM  13.  1:  Given  a  three-layer  series -cou^iled  perceptron  with 
simple  A  and  R-units  and  variable  S-A  connections,  and  a  classi¬ 
fication  (.{w)  for  which  a  solution  exists,  it  may  be  impossible  to 
achieve  a  solution  by  any  deterministic  correction  procedure  which 
obeys  the  local  information  rule. 

THEOREM  13.2:  Given  a  three-layer  series -coupled  perceptron,  with 
simple  A  and  R-units,  var '.able -valued  S-A  connections,  bounded 
A-R  values,  and  a  classification  f'{W}  for  which  a  solution  exists. 
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then  a  solution  to  can  be  obtained  in  finite  time  with 
probability  1  by  means  of  a  back-propagating  error-correction 
procedure,  given  that  each  stimulus  in  tV  always  reoccurs  in 
finite  time,  and  that  probabilities  ,  p ,  ,  and  are  all  grea.ter 
than  0  and  less  than  1. 


(See  Section  13.  3  for  definition  of  the  back-propagating  correction 


APPENDIX  C 


BASIC  EQUATIONS 


The  following;  equations  are  those  most  likely  to  be  referred  to 
repeatedly,  and  are  listed  here  in  a  somewhat  different  order  from  their 
appearance  in  the  text, 


(1)  Generalization  Coefficients 
For  an  iv -system, 

Q.  .  -  n  ■  • 

E  0--  -  vv  •  •  (normalized  form) 

J-.J  ‘o'  ' 

For  a  /  -system, 

.  =  rr  •  -  i  // N ^  )  n-  n  ■ 

^nr.  -  Qr.  -  Q;Q:  (normalized  form) 

^  V  ^  J  ^  J 


(Z)  R-unit  Input  Signals 

For  an  ry,  or  /-system, 

(j  =  G  x 

where  a  is  the  vector  of  R-unit  input  signals,  and  being 

the  number  of  times  3;  has  been  reinforced). 


(3)  Q-Functions 


For  individual  stimuli,  in  a  simple  perceptron, 


r, 


f- V 


P,Gf)  P^jiD 


where 


'  tr tax 


E-9  r=fl 
_  ]  ^  for  binomial  model 
I  o-'j  for  Poisson  model 


Py(fG  ~  probability  that  F  excitatory  connections  to  an 
A -unit  originate  from  active  S -points  (see 
Equations  6.  2  and  6.  3) 


~  probability  that  I  inhibitory  connections  to  an 
A-unit  originate  from  active  S-points  (see 
Equations  6.  2  and  6.  3) 
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Ei  f  E^_  -I: -I^  >  0 
F I  ^  F I-  -/..  >  G 

where  and  P  are  defined  by  Equations  6.  6  and  6.  7,  for  binomial  and 
Poisson  models. 

For  series -coupled  perceptrons  with  distributed  transmission 
times,  see  Sections  11,1  and  11.2  for  prototype  equations. 

For  multi-layer  series -coupled  systems,  Q-functions  for  the 

^  th 

F  layer  can  be  computed  by  the  approximation  described  in  Section  15.  1. 

For  s imilarity -constrained  four -layer  perceptrons,  Q-lj  for 

two  random  or  unrelated  stimuli  is  given  by: 

,  *  '  .  ■!),  '""I  ' 

/  i  ^  ) 

where  m  is  the  number  of  A'  units  connected  to  each  /I  unit. 


For  a  stimulus  5/  ,  and  its  transform  S-'  ,  in  a  similarity- 

constrained  model, 

r,  '  _  (2) 

.r  -*  f  rr' 

where  v-  and  '■  y\;  can  be  approximated  by  Equations  15.5 

and  15.  8  for  the  case  of  random  stimulus  patterns  in  a  finite  retina.  In  an 
infinite  retina,  with  random  stimuli,  O---  .  For  coherent  stimuli  and 


assuming  '  to  be  a  topological  transformation, 


rn  -  / 
.  -  i 


O  il 


(!) 


fn  -  t 


I-  /- 


Q 


(0 


(!) 


where  'cj  is  the  order  of  the  transformation  group,  and  given 

by  Equation  15.  6.  A  particular  solution  for  the  case  of  square  stimuli  can 
be  found  in  Equation  15.  15. 


.  For  cross -coupled  perceptrons  with  fixed  connections,  Qi  ^  and 
'j-  ■  arc  given  by  Equations  18.  1  and  18.2,  respectively. 

IjJ 

For  adaptive  four -layer  and  cross -coupled  systems,  the  terminal 
values  of  the  Q-functions  are  obtained  as  a  product  of  the  iterative  procedures 
described  in  Chapters  16,  17,  and  19,  and  take  the  form: 


-604- 


Q;;  -  2^  P(/3^)  0(/3^  Td  ^ 

(4)  Equations  f,or  Learning -Performance 

For  an  error  correction  procedure,  an  upper  bound  on  the 
number  of  corrections  that  will  be  required  to  achieve  a  solution  from  zero 
initial  conditions  is  given  by 
N  ^  n  M/ oL- 

where  n  is  the  number  of  stimuli  in  W  ,  M  is  the  maximum  diagonal 
element  g-^  ,  and  cx^  is  the  minimum  of  the  function  x ||  ^  as  defined 
for  Theorem  4,  Chapter  5.  For  a  more  general  bound,  see  Equation  7.  12. 


For  an  S-controlled  learning  procedure,  in  an  elementary 
perceptron,  a  bound  on  the  error  probability  for  a  "positive  stimulus"  S^y 
is  given  by 

An  improved  estimate  of  the  probability  of  correct  response,  employing  a 
normal  distribution  assumption,  is  given  by  Equation  7.  7, 


For  fixed  training  sequences. 


E(u.z)  ^ 


for  an  cc  -system 


L  Pj  ( fo’’  a  /■  or  /'-system 


rr  0/ r)  -  ^  JL  H.  Pj  P^  ^  ^ 


for  an  rx -system,  and 


<y  « 

-  [  Q^Ij.  ~  Qj  Q  r)  z~  ^  ^  ^  X  ) 

for  a  /'-system.  The  equation  for  a  true  -system  is  given  in  Equation  8.  7. 

For  random  training  sequences,  t'Hylis  as  above,  and  the  variances 
are  given  by  Equation  7.  11  for  an  (X -system,  and  Equation  8,  14  for  a  /  -system. 
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(5)  Steady -State  Equations  for  Four-Layer  and  Cross-Coupled  Systems 


For  an  adaptive  four -layer  oi -perceptron  (Chapter  16),  the 
terminal  values  of  the  signals  transmitted  by  the  variable -valued  connec 
tions  are  given  by  iterating  the  equation: 

n 


r, 


(p  f  I ) 


NaJ 

(T 


L  ci:  r;l,) 


!  (  // 

where  /  -  O  and  C-  ■  -  ^  being  the  frequency  of  the 

sequence  5^5  •)•  This  equation  will  converge  in  at  most  n  steps  to  the 
terminal  value  of  .  Equations  for  '/  and  P  -systems  are  presented 

in  Chapter  16. 


For  an  open-loop  cross -coupled  system,  the  above  iteration 
equation  applies  without  modification. 

For  a  closed-loop  cross -coupled  o' -perceptron,  the  iteration 
equation  becomes 


r 


,  r/ 


L 

V 


P.^  0(/P  ^ 


which  is  specific  to  the  A-unit, 

,r'  -vector  /j- .  The  solutions  for  p 
Chapter  19. 


PiP  L  M/p  piPpp  rpi 

J 

or  to  the  set  of  A-units  having  the 
and  /'  -systems  are  discussed  in 
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APPENDIX  D 


STANDARD  DIAGNOSTIC  EXPERIMENTS 


A  number  of  experiments  have  been  described  in  the  course  of 

the  text  which  are  employed  for  comparison  and  evaluation  of  different  percep- 

tron  models.  Those  experiments  which  are  referred  to  by  number  are  listed 

here  for  convenience  in  cross-referencing  figures  and  discussions  in  the  text. 

EXPERIMENT  1:  Hpr izontal/vertical  bar  discrimination,  in  ZO  by  ZO 
toroidally  connetted  retina,  with  4  by  ZO  bars.  Stimuli  occur  in 
fixed  sequence.  S-controlled  reinforcement  is  employed. 

(see  Page  16Z) 

EXPERIMENT  Z;  Same  environment  and  procedure  as  Experiment  1,  but 
with  alternating  positions  in  opposite  classes,  (see  Page  164) 

EXPERIMENT  3;  Same  as  Experiment  1,  but  with  stimuli  occurring  in 
random  sequence,  (see  Page  170) 

EXPERIMENT  4:  Same  as  Experiment  3,  but  horizontal  bars  occur  four 
times  as  frequently  as  vertical  bars,  (see  Page  170) 

EXPERIMENT  15:  Same  as  Experiment  1,  but  with  error -correction  reinforce¬ 
ment.  (see  Page  173) 

EXPERIMENT  6:  Same  as  Experiment  5,  but  with  stimuli  occurring  in 
random  sequence,  (see  Page  173) 

EXPERIMENT  7:  Triangle/Square  discrimination  experiment,  with  error- 
correction  procedure,  in  ZO  by  ZO  retina.  Random  sequence,  with 
stimuli  occurring  in  all  translational  positions  with  equal  probability. 

(see  Page  173) 

EXPERIMENT  8:  Horizontal/vertical  bar  discrimination,  with  random 
sequences,  and  random-sign  correction  procedure;  (see  Page  176) 

EXPERIMENT  9:  Horizontal  and  vertical  bars  in  random  sequence,  with  R- 
controlled  reinforcement,  (see  Page  Z14) 

EXPERIMENT  10:  "Spontaneous  organization"  experiment,  with  an  environ¬ 
ment  of  n  stimuli,  such  that  all  pairs  have  equal  intersections.  The 
stimuli  are  divided  into  two  classes,  and  the  perceptron  is  exposed  to 
a  preconditioning  sequence  in  which  the  transition  probability  between 
members  of  the  same  class  is  large,  and  the  transition  probability 
between  classes  is  small.  At  the  end  of  the  preconditioning  sequence, 
R-controlled  reinforcement  is  applied  for  a  brief  period,  (see 
Page  365) 
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EXPERIMENT  11;  "Transformation  learning"  experiment,  in  which  percep- 
tron  is  exposed  to  alternating  preconditioning  sequence  of  stimuli  and 
their  transforms.  After  the  preconditioning  period,  the  perceptron 
is  taught  to  discriminate  two  test  stimuli,  which  were  not  previously 
seen,  and  is  then  tested  on  their  transforms,  (see  Page  375) 

EXPERIMENT  12;  The  preconditioning  sequence  consists  of  a  repetitive 
sequence  of  four  stimuli,  with  spatial  relationships  favoring  the 
dichotomy  vs  (S^yS^),  while  temporal  association  favors  fS,,  S^) 

vs  j.  The  Q-matrix  is  evaluated  at  the  end  of  the  preconditioning 

period,  (see  Page  393) 

EXPERIMENT  13;  "Sequence  prediction"  experiment.  The  preconditioning 
procedure  uses  a  finite  sequence-environment  with  the  same  stimuli  as 
in  Experiment  12,  but  the  perceptron  is  tested  (in  addition)  with  the 
stimulus  Si  followed  by  a  sequence  of  null  stimuli,  and  the  Q-matrix 
for  all  subsequences  is  obtained,  (see  Page  *145) 

EXPERIMENT  14;  Preconditioning  procedure  with  same  stimuli  as  in 
Experiment  12,  but  with  each  stimulus  repeated  two  times  whenever 
it  occurs.  The  terminal  Q-matrix  for  all  subsequences  is  determined 
(see  Page  450) 

EXPERIMENT  15;  Selective  attention  experiment,  for  a  four  R-unit  percep¬ 
tron  trained  to  discriminate  shapes  and  retinal  positions  of  stimuli, 
and  then  tested  with  complex  stimuli  combining  two  shapes  and  two 
positions  simultaneously,  (see  Page  478) 

EXPERIMENT  16:  Selective  attention  in  an  audio-visual  perceptron, 

trained  to  discriminate  shapes  and  positions  as  in  Experiment  15,  but 
biased  by  the  addition  of  an  auditory  name  for  the  shape  or  position 
of  part  of  the  stimulus  pattern,  (see  Page  482) 
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